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ABSTRACT 


When multiple database schemas are integrated, there are often conflicts in the naming of 
attributes within the schemas. These conflicts must be detected and resolved prior to successful 
integration of the schemas. This thesis describes a method for automatically detecting such 
naming conflicts, which adapts and enhances a method for detecting similar conflicts in 
(mathematical) model integration. The method relies on the representation of semantic 
information, not found in data dictionaries, about the data elements or attributes present in the 
various schemas. This information about data elements is then used by mechanical inference 
procedures to automatically determine whether two distinctly named elements in fact represent 
the same object (the synonym problem), or if data elements with the same name in different 
schemas actually represent different objects (the homonym problem). The expected accuracy and 
errors of these procedures, and results obtained from a set of experiments on the use of this 


method, are also presented. 
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I. INTRODUCTION 


This thesis examines and develops a method for automatically detecting possible naming 
problems of data elements prior to database integration. These naming problems or conflicts 
involve synonyms and homonyms. Synonyms are data elements in two or more different databases 
which are given different names but contain information about the same thing. Homonyms are 
data elements in two or more different databases which are given the same names but contain 
information about different things. For example, one database might call the data element which 
contains a person's given name, "FIRSTNAME," while another calls it "FNAME." Conversely, 
that same database might call the data element which contains the last name of an individual, 
"NAME," while another calls the data element which contains the full name of an individual, 
"NAME." 

These naming problems have been identified in database literature and it is accepted that 
prior to schema integration, naming problems must be detected and resolved (e.g., Bhargava, 
Kimbrough, and Krishnan 1990; Kamel and Hsiao 1990; Batini, Lenzerini, and Navathe 1986; 
Wang and Madnick 1989; Larson, Navathe, and Elmasri 1989; and Hayne and Ram 1990). 
Further, several methodologies have been proposed for detection (e.g., Batini et al. 1986; Larson 
et al. 1989; Mannino and Effelsberg 1984), but are not supported with automated tools. The 
methodologies require the database designer or administrator to detect the problems by 
systematically examining the data elements in each database. The database designer or 
administrator are able to locate many of these problems by reviewing the data dictionaries and 
additional information about the databases from other sources (e.g., users). As noted, these 


methods can be tedious and extremely time-consuming. 


More recently, a method for supporting automatic detection of possible naming problems 
in model integration has been proposed by Bhargava, Kimbrough, and Krishnan, hereafter referred 
to as Bhargava et al. (1990). This method requires a database designer to further define each 
model variable by providing dimensional information and information (called "quiddity"), see 
Chapter III, about the nature or essence of the data contained in the variable. This thesis 


addresses the applicability of quiddity in detecting naming conflicts prior to database integration. 


A. BACKGROUND 

A database is simply a computerized record-keeping system. The information in a database 
is stored (at the lowest level) in units called data elements or data items. Each data element has 
a unique name associated with it. For example, the data element which contains an individual's 
social security number could be called "SSN." Data elements also have other assigned 
characteristics such as type and size. The type tells whether the data element is alphabetic, 
numeric, or a special character. The size describes the length of the data element, e.g., the 
number of characters that fit in the field. All this information about data, including relationships 
between the data elements, comprise the database schema. The schema provides a complete and 
logical view of the database. (Date 1990) 

With the proliferation of database technology, many organizations need to access and share 


information between databases to facilitate decision making, operations planning and control, and 


strategic planning. 


This situation has led to the emergence of the heterogeneous distributed database scenario. 
In this scenario, a variety of large and small computers, each with its own autonomous and 
often incompatible DBMS [Database Management System], may be tied together in a 
network. This network could consist of local area, wide area, and long-haul networks. 
Under current technology, however, a user accessing any database in this network must 
abide by the syntactic and semantic rules of that database (Cardenas 1985). ... A true 
heterogeneous DBMS should support an environment in which any user in the network is 
given an integrated and tailored view, while in fact the data could physically reside on a 


single or several databases managed by different and possible heterogeneous DBMS. This 
level of data access and sharing is known as database integration. (Kamel et al. 1990) 


One of the steps to be performed before database integration can occur is schema 
integration, i.e., the integration of the local schemas of the databases involved into one global 
schema. Several different problems are encountered during schema integration. The one we are 
primarily concerned with involves the different ways similar information is captured in the 
databases being integrated. The fact that the same data may be described differently in each local 
schema presents some very challenging issues. Zviran and Kamel (1989) classify these issues into 


four general areas. They are: 


l. Name Conflicts. These conflicts exist when there are synonyms, i.e., data elements with 
the same name but representing different concepts, and homonyms, i.e., data elements 
with different names but representing the same concept. For example, one database 
might call the data element which contains a social security number, "SSN," while another 
calls it "SSNO" (synonym). Or, two databases might both have data elements called 
"DATE" but in the first database the date represents the current date while in the other 
it represents an individual's employment date (homonym). 


2. Structural Conflicts. These conflicts exist when the same information is represented 
in different structures in each schema. For example, an individual's full name (first, 
middle, and last) is maintained as a single data element in one database but is split into 
three data elements in another. 


3. Scale Conflicts. These conflicts exist when the same facts are expressed in different 
units of measure in each schema. For example, a person's height may be captured in 
inches in one schema and feet in another. 

4. Conflicts in Application Semantics. These conflicts exist when perceptions about 
information differ between schemas. For example, the relationship between two objects 


in a schema is represented as "one to many," but is represented in another schema as "one 
to one." 


Identifying and resolving these conflict issues is a critical step in successful schema integration. 


B. PROBLEM DESCRIPTION 
While technological improvements have kept pace with the increased requirements for 


exchange and sharing of information, automated tools or methods to facilitate the physical 


integration of the data prior to exchange, i.e., schema integration, have been slow in coming. For 
example, those knowledgeable in the process of database integration continually emphasize the 
importance of identifying naming conflicts among databases prior to integration but fail to provide 
tools with which to accomplish this crucial yet painstaking task (Bhargava et al. 1990). 

The problem of naming conflicts, i.e., the violation of the unique names assumption’ 
(UNV) in database integration occurs when there are synonyms or homonyms among the data 
elements. For example, one database might call the data element which contains a social 
security number, "SSN," while another calls it "SSNO" (synonym) or that same database might 
call the data element which contains the last name of an individual, "NAME," while another calls 
the data element which contains the full name of an individual, "NAME" (homonym). Another 
interesting twist to the problem is that it is possible to have data elements with different names 
containing information about the same thing but having different values. This can happen when 
there are small measurement errors. For example, two databases with data elements called 
"HEIGHT" and "HT," respectively, capturing information about the "height" of the same person 
could have different values, e.g., 68" versus 67". If the same data element name had been used, 
the problem of two different values would be easily detected. Not resolving these and similar 
conflicts before integrating would result in a database which clearly has redundant data (e.g., two 
data elements containing social security numbers) and would in all likelihood develop serious 
consistency problems (e.g., similar fields with different values). 

How are these conflicts detected? There are two basic methods currently used in identifying 
these conflicts. In the first method, the data element names are compared syntactically, and data 


types (e.g., numeric, alphabetic) and field lengths are matched. The second method involves an 


""That every individual has at most one name, unless stated otherwise, is often a useful 
and convenient assumption in software systems, and is called the unique name assumption." 
(Bhargava et al. 1990) 


examination of the data dictionary. The data dictionary has more descriptive, semantic information 
about the data elements. However, it is written in non-formal, human language, which is not 
amenable to machine inference. 

How can we identify these conflicts through automation? Clearly, we need more semantic 
information: information about what the data element represents. Bhargava et al.'s (1990) 
method for supporting automatic detection of possible naming problems requires that each data 
element be further defined in terms of dimensional information and information about the nature 
or essence (quiddity) of the data contained in the data element. The quiddity of a data element 
is specified using various rules of formulation and a given vocabulary. The objective of this 
approach is to identify pairs of data elements with possible unique names violations by comparing 
the dimensional information and quiddity of each data element in the databases being integrated. 
The premise of this approach is that if two data elements have the same quiddity, it is fairly likely 
that they refer to the same concept. This automated approach will not specifically identify naming 
conflicts, rather, it will result in a list of possible conflicts. Human interaction is required to 
confirm specific conflicts. The intent is to develop a list of possible conflicts which comes as close 
as possible to the "correct list." 

It is fairly straightforward to provide dimensional information for each data element because 
there are a finite set of dimensions (e.g., length, mass, time, volume). However, the quiddity of 
each data element is more complex to define. Quiddity must be stated in a well defined form, a 
type of formal language, in order to be read and compared by a computer. For example, the 
quiddity of the data element "NAME" (referring to the last name of an individual) discussed above, 


would be "last(name(person))." 


“Тһе representation of quiddity is discussed in greater detail in Chapter II. 


C. THESIS OBJECTIVES 
The aim of this thesis is to examine several aspects of quiddity, broadly classified into those 
dealing with quiddity concept definition, quiddity acquisition, and quiddity manipulation and 
inferencing procedures.’ Specifically, the following questions in each area will be addressed. 
1l. Quiddity Concept Definition 
Is the idea of quiddity, as defined by Bhargava el al., a practical tool for use in the 
integration of databases? Specifically, is the concept of quiddity sufficiently rich or expressive in 
the database context? If not, in what ways can the concept be modified to one that is rich enough? 
Can quiddities provide a basis for automatically detecting unique name violations, or will the 
quiddities create more problems than they solve? 
2.  Quiddity Acquisition 
Can this method be easily understood and applied by database designers? In other 
words, will two individuals always develop the same or equivalent quiddity definition given identical 
data elements, information, and training? If not, how can the acquisition process be supported? 
3.  Quiddity Manipulation and Inferencing Procedures 
What kinds of inference procedures can be defined to utilize this quiddity information 
in order to automatically detect naming conflicts? How can these procedures be implemented? 


What is the accuracy and error rate of these procedures, in terms of Type I and Type II errors?* 


?Bhargava et al. (1990) have discussed a formal, functional representation for quiddities. For 
our purposes, a less formal tabular notation will suffice. Hence, this thesis is not concerned with 
issues in quiddity representation. 


“A Type I error is indicating a naming problem when there is none. A Type II error is failing 
to indicate a naming problem when there is one. 


D. METHODOLOGY 
The research for this thesis follows these steps: 

1. Conduct preliminary experiment. In this experiment, explain the concept of quiddity to 
a group of six Computer Systems Management students and ask them to then develop 
"quiddities" for a sample set of data elements in a database. This first experiment will be 
primarily used to ensure that all subjects understand the concept and what is being asked 
of them, in other words, to eliminate any "noise" which could interfere with the analysis 
of the concept itself. 

2. Refine, analyze, and enhance the concept based on results of preliminary experiment. 
Provide feedback to students on "correctness" of their experiment answers. Develop and 
present several procedures for comparing the quiddities in the experiment. 

3. Conduct primary experiment with the same individuals who participate in the first 
experiment, using a new sample set of data elements. Present any new rules or 
instructions in developing quiddity to the students based on the analysis and any 
enhancements developed in step 2. 


4. Analyze the results of the primary experiment by applying the comparison procedures 
developed in step 2 to the students’ quiddity definitions. 


5. Evaluate results of primary experiment, discussing any shortcomings in the quiddity 
concept or inference procedures. Discuss future areas of research. 


E. THESIS STRUCTURE 

Our research is presented in five chapters. Chapter II provides a general review of related 
work in detecting naming conflicts in database integration. Several issues related to our proposed 
method for UNV detection are addressed in Chapter III. Section A presents a detailed description 
of this proposed method. Results of a preliminary experiment along with a refined concept based 
on the experiment analysis appear in Section B. Finally, Section C discusses several quiddity 
manipulation and inferencing procedures. Chapter IV describes the primary experiment and 
presents detailed experiment results and analyses. Chapter V presents our conclusions and 


suggests issues for future research. 


II. REVIEW OF RELATED WORK 


Our aim in this chapter is to present a general overview of current literature pertaining to 
database integration, with emphasis on those which address methods for the detection of naming 


conflicts or present automated tools for use in detecting such conflicts. 


A. SCHEMA INTEGRATION 

The literature to date views schema integration in two contexts. The first, commonly 
referred to as view integration, generates a global conceptual description or logical integrated 
schema of a proposed database during database design. The second, referred to as database 
integration, generates the global schema of a group of databases in distributed database 
management. (Batini et al. 1986) 

Kamel et al. (1990) have reviewed and grouped current literature into the context of view 
integration and database integration. Previous research has focused on schema integration in the 
context of view integration (Batini, Lenzerini, and Moscarini 1983; Elmasri and Navathe 1984; 
Elmasri and Wiederhold 1979; Motro and Buneman 1981; Navathe, Sashidhar, Elmasri 1984; and 
Sheth, Larson, Cornellio, and Navathe 1987), while some have addressed issues of schema 
integration in the context of database integration (Kamel et al. 1990; Dayal and Hwang 1984; 
Deen, Amin, and Taylor 1987; DeMichiel 1989; and Wang and Madnick 1989). Batini et al. (1986) 
have also provided a general survey on view integration methodologies. 

Schema integration, regardless of context, involves many complex issues. One of these 
issues is conflict identification and resolution, specifically, conflicts in name or unique names 
violations. Although methodologies do address this issue, Bhargava et al. (1990) state that most 


methods assume that unique names violations are dealt with prior to integration (Casonova and 


Vidal 1983; Yao, Waddle and Housel 1982). Others have suggested that naming conflicts can be 
easily handled simply by renaming (Dayal and Hwang 1984), but have not proposed how to handle 


the conflicts. 


В. APPROACHES ADDRESSING NAMING CONFLICTS 

Larson et al. (1989) propose a method of schema integration which provides assistance in 
the detection of naming conflicts. (This methodology builds on previous works (Elmasri and 
Navathe 1984; Navathe, Sashidhar, and Elmasri 1984; and Elmasri, Larson, and Navathe 1986.)) 
Basically, this method involves the application of certain criteria to attributes (data elements) in 
order to determine “attribute equivalence." Equivalent attributes have several characteristics in 
common and can be integrated. Examples of attribute characteristics considered are uniqueness, 
cardinality, domain, static and dynamic semantic integrity constraints, security constraints, 
allowable operations, scale, and others that a database administrator feels are important. Then, 
based on certain equivalence properties, the attributes are integrated. This concept is also used 
to define object and relationship set equivalences for integration purposes. The criteria for 
attribute equivalence is applied to naming conflicts which can then be identified and resolved. 
However, this is a tedious, manual process. 

Mannino and Effelsberg (1984) have suggested an integration process using assertions, made 
by database designers, about semantic equivalence between objects. While very similar to Larson 
et al. (1989) above, this methodology is not as detailed in its treatment of equivalence. Here again, 


naming conflicts are found through a manual process. 


C. AUTOMATED TOOLS FOR SCHEMA INTEGRATION 
Larson et al. (1989) have designed and implemented a schema integration tool based 
partially on their concept described in Section B. With this tool, the database administrator is 


shown descriptions of the schemas being integrated. The database administrator then specifies 


all equivalences between schema objects and interactively integrates the schema. While this is a 
step toward automating the process, the database administrator must still "manually" establish all 
equivalence characteristics before the schema is "automatically" integrated. 

Hayne and Ram (1990) have developed a knowledge based system called MUVIS (Multi- 
User View Integration System) to support the design of distributed object-oriented databases. 
This system automates the view integration process as proposed by Navathe et al. (1986). MUVIS 
aids designers in modeling user views using the Extended Entity Relationship Model and 
integrating these views into a global conceptual view. MUVIS's expert system compares objects 
and computes equivalence assertions about these objects using heuristics. Integration rules are 
then applied and the designer confirms the integration. The designer determines whether there 
is a naming conflict prior to integrating when he or she confirms the integration. 

Hayne and Ram (1990) also reviewed other design tools that are currently available. Several 
design tools for view modeling and integration have been implemented using the expert system 
approach. These systems (Bouzeghoub, Gardarin, and Metais 1985; Choobineh, Mannino, 
Nunamaker, and Konsynski 1988; and Dogac, Yuruten, and Spaccapietra 1989) do not provide 
graphical interfaces but do allow the specification of incomplete designs and can justify and explain 
results produced. Again, these tools may automate part of the integration process but do not 
automate the actual detection of naming conflicts. These conflicts are found through interaction 


between the designer and the tool. 
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ПІ. USE OF QUIDDITIES IN AUTOMATIC DETECTION OF NAMING PROBLEMS 


Our aim in this chapter is to describe in detail a proposed method wherein quiddities of data 
elements are declared and used in the automatic detection of naming conflicts. This idea was first 
developed by Bhargava et al. (1990), and we present a summary of their approach in Section A. 
Section B presents the results of a preliminary experiment conducted to provide initial data about 
the applicability of the method in detecting naming conflicts. These results, a deeper analysis, and 
a linguistic perspective are employed to propose refinements to the method. Section C proposes 


quiddity manipulation and inferencing procedures to be used by the automated process. 


A. CONCEPT AND MOTIVATION 

Bhargava, Kimbrough, and Krishnan (1990) have proposed a method for supporting 
automatic detection of possible naming problems, specifically, unique names violations in model 
integration.” Their contention is that in order for any automated system to recognize that two 
variables with different names represent the same information, or vice versa, a system requires 
more information about these variables. This method attempts to develop a principled means of 
providing and expressing that information. It requires capturing two categories of information 
about each variable, its dimension and quiddity. The premise is that if two syntactically distinct 
variables have the same or equivalent dimension and quiddity, a possible unique names violation 


is indicated. (Bhargava et al. 1990) 


? All quotes in this chapter (unless otherwise noted) have been borrowed from Bhargava et al. 
1990. 
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1. Representing Dimensional Information 

The task of identifying dimensional information for each data element is simple 
because there is a small number of dimensions (e.g., length, mass, time, volume) recognized in 
most applications. Even if other dimensions such as currency are added, the set has few elements. 
Additionally, there is a "place holder" (represented by 1) for dimensionless quantities, such as 
percentages. Derived dimensions (e.g., volume, acceleration, weight, power) are also allowed. 

For two reasons, Bhargava et al. suggest the use of abstract dimensional expressions, 
e.g., currency rather than dollars, even though dimensional information is best captured using 
three components: dimension, unit, and scale. First, the unit information of "dollars" can be 
captured by the dimensional component, "currency." Second, the use of the most abstract 
dimensional expression reduces Type II errors when discovering naming problems (Bhargava et 
al. 1990). For example, suppose a variable is used in two models to measure a quantity of apples. 
In the first model, the variable X (for apples) is measured in bushels. In the second model, the 
variable Y (also for apples) is measured in quarts. There is a unique names violation, but the 
rules will not find it because the dimensions are not the same. Since both bushels and quarts are 
measures of volume, the dimension could be stated more generally as "volume," causing the naming 
violation to be detected. 

2. Representing Quiddity 

The task of defining the meaning or the quiddity of each variable is more complex. 

The quiddity of a data element provides a description of what the data element is about. Clearly, 


quiddities must be stated unambiguously, in a formal language, in order that they be readable and 


*"Some authors use the term quantity ... [in place of] ... the term dimension ... [as it is 
used here]. (Bhargava et al. 1990) 


""From the Oxford English Dictionary, quiddity is The real nature or essence of a thing; that 


which makes a thing what it is.' Of course, ... [the method's] language for expressing quiddities 
is only a model, or approximation, of genuine quiddity, if it exists.” (Bhargava et al. 1990) 
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comparable by a computer. Bhargava et al. establish five categories for capturing the quiddity of 
a variable: stuff, types of stuff, attributes of stuff, types of attributes of stuff, and metafunctions. 
To specify valid quiddity expressions, a basic vocabulary for each of the five components is 
provided.? To develop the quiddity of a variable, each of the five components (described below) 
are examined and, if applicable, declared. The example shown in Figure 1 is designed to illustrate 


this definition process for each of the components. 


What is the quiddity for this variable? 


e Variable Name: purchase cost 


e Data Dictionary Description: 


"Average cost of purchasing a Dodge 
truck during the month of July." 





Figure 1 Illustration Variable 


a. Components of Quiddity 
(1) Stuff. Stuff answers the question "what is the variable about?" Stuff 
is usually indicated by a noun, describing individual things or collections of individual things, such 
as cars, trucks, or ships. What is the variable, shown in Figure 1, about? It's about a truck. 
Therefore, truck is the stuff component of this variable's quiddity. 
Additionally, a stuff term may have arity? if one or more arguments are 
required to fully define the stuff term. With quiddity, arguments are added to the definition, 


when necessary, to further define stuff. Suppose "path" is the stuff expression. In this case, we 


*For the purposes of the following discussion and examples, assume all terms used in 
developing quiddity are a part of an established basic vocabulary. 


? Arity identifies the number of arguments required to specify a function. For example, the 
function of "addition" has an "arity of 2" because you must have two arguments in order to perform 
the function, in other words, to add. Division also has an "arity of 2," whereas the square root 
function has an "arity of 1" (you only need one argument to find the square root). 
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would need to know the two end points of the path in order to define the exact path. Thus, "path" 
has an arity of 2 since it has two arguments (the two end points). There is no limit for the arity 
of a stuff term except that it be finite. Of course, some stuff expressions need no arguments (е.р., 
apple or ship) and have an arity of 0. | 

In our example, we are interested in a truck purchased during a given 
month, July. In this instance, the stuff expression, truck, should be further defined because we 
are concerned with the truck at a specific point in time. Therefore, truck has arity of 1, with 
the argument month. 

(2) Stuff Type. Stuff type answers the question "what sort of or kind of 
stuff is it? Stuff types further describe stuff. For example, with both stuff and stuff type we can 
distinguish between a "truck tire" and a "tire truck." In the first case, what is the variable about? 
It is about a tire. What sort of tire? A truck tire. Thus the stuff is tire and the stuff type is 
truck. However, in the second case, the variable is about a truck. What sort of truck? A tire 
truck. Thus the stuff is truck and the stuff type is tire. (Bhargava et al. 1990) To continue with 
the example in Figure 1, the stuff type of truck (stuff) is Dodge. 

(3) Stuff Attribute. Stuff attributes answer the question "what is it about 
the stuff that you are interested in? Stuff attributes represent information about some aspect 
of the stuff we are interested in. From the example above, what is it about a truck that we are 
interested in? The cost. Therefore, cost is the stuff attribute of the variable purchase cost. 

(4) Stuff Attribute Type. Stuff attribute types answer the question "what sort 
of or kind of stuff attribute is it? From above, the stuff attribute was cost. What sort of cost 
are we interested in? Purchase cost. Thus, purchase is the stuff attribute type qualifying the 


stuff attribute cost. 
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(5)  Metafunctions. "Metafunctions capture information about the variable 
associated with the quiddity." Examples of metafunctions are average, maximum, minimum, sum, 
and variance. Type II errors in indicating possible naming violations can often be reduced by 
identifying metafunctions in quiddities. For instance, if A and B are variables for the price of fuel, 
but A is an average price while B is not, then no unique names violation should be indicated 
(Bhargava et al. 1990). From our illustration example, the metafunction associated with the 
variable purchase_cost is average. 

b. Formal Representation 

For a computer to be able to compare the quiddities of variables, the quiddities 
must be represented in a standard format or formal language. Bhargava et al. (1990) recommend 
and develop a rigorous representation in a formal language, for capturing quiddity information. 
This representation is illustrated using the purchase_cost variable in Еге 2. In this 
representation, there may be instances where there are multiple terms ina component. When this 
happens, the terms are listed alphabetically, to remove ambiguity. Three additional examples of 
this representation are provided. While these examples are somewhat contrived and simplistic, 


they demonstrate the basic steps taken in developing the quiddity for a variable. 


Quiddity Representation 


Metafunction(Stuff Attribute Type(Stuff Attribute(Stuff Type(Stuff(Arg 1, ..., Arg п)))) 


Quiddity of purchase cost: average(purchase(cost(Dodge(truck(month))))) 





Figure 2 Quiddity Representation 


(1) Example 1. Consider a variable which captures information about the 
status of an unmanned fighter aircraft. What is the variable about? An aircraft (stuff). Does 


aircraft need further definition, or, in other words, is aircraft a function of something else? No, 
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so aircraft has an arity of 0 (no arity arguments). What sort of aircraft (stuff) is it? It is an 
unmanned aircraft. It is also a fighter aircraft. We have two stuff types. What is it about the 
aircraft that we are interested in? Its tail number?? (stuff attribute). What sort of tail 
number is it? We have no further information so we do not have a stuff attribute type. The 


quiddity representation for this example is shown in Figure 3. 








tail number( fighter, unmanned (aircraft) ) 


A A A А | 
| L— STUFF 
STUFF ATTRIBUTE STUFF TYPES 


Figure 3 Quiddity Representation -- Example 1 






(2) Example 2. Consider a variable which captures information about the retail 
cost of an IBM personal computer. What is the data element about? A personal computer 
(stuff). Do we need any arguments to further define personal computer? No, thus there are 
no arity arguments (arity 0). What sort of personal computer (stuff) is it? It is an IBM (stuff 
type). What is it about the personal computer that we are interested in? The cost (stuff 
attribute). What sort of cost is it? Retail (stuff attribute type) cost. The quiddity representation 


is shown in Figure 4. 


retail(cost(IBM(personal computer) )) 


А А А А 
STUFF ATTRIBUTE S | no STUFF 


TYPE STUFF TYPE 


STUFF ATTRIBUTE 
Figure 4 Quiddity Representation -- Example 2 





*°Since tail number is a word phrase denoting one concept, the formal quiddity representation 
connects the words in this form "tail number." This representation allows us to distinguish 
between word phrases which denote one value for a component (e.g., tai] number for the stuff 
attribute component) and two distinct values for a component (e.g., unmanned and fighter for the 
stuff type component). 
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(3) Example 3. Consider a variable which captures information about the 
current replacement cost of a foreign car. What is the variable about? A car (stuff). Does car 
need any arguments to define it or, in other words, is it a function of something? Yes, we are 
interested in a car at a specific point in time. Therefore, car has an “arity of 1" with the 
argument time. What sort of car (stuff) is it? It is a foreign (stuff type) car. What is it about 
the car that we are interested in? The cost (stuff attribute). What sort of cost is it? 


Replacement (stuff attribute type) cost. The quiddity representation is shown in Figure 5. 


replacement (cost(foreign(car(time) ))) 


А А А А А 
STUFF ATTRIBUTE в” 4 — ARITY 
TYPE 
STUFF ATTRIBUTE STUFF 


STUFF TYPE 





Figure 5 Quiddity Representation -- Example 3 


c. X Validity Rules 
Given a basic vocabulary for each category, the following rules for determining 
valid stuff terms apply (Bhargava et al. 1990, 15). 

1. Ifa is in the vocabulary of basic stuff expressions, then a is a valid stuff term, providing 
that each of its arguments has the form arg(n), where n is an integer identified with a 
declared variable (or is a declared variable-indicating expression). 

2. Ifa is in the vocabulary of basic stuff expressions, then a[arg(n)] is a valid stuff term, 
where alarg(n)] has one more argument than о and n is an integer identified with а 
declared variable (or is a declared variable-indicating expression) with a quiddity of index. 


3.  $(a)is a valid stuff term if a is a valid stuff term and ф is in the vocabulary of stuff types. 


4. (a) is a valid stuff term if a is a valid stuff term and @ is in the vocabulary of 
metafunctions. 


5. Nothing else is a valid stuff term. 
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Given the above rules for determining valid stuff terms, the following rules for 


determining valid quiddity terms apply (Bhargava et al. 1990, 16). 


1. If «ais a valid stuff term, then a is a valid quiddity term. 


2. (a) is a valid quiddity term if a is a valid stuff term, and ¢ is in the vocabulary of stuff 
attributes. 


3. (a) is a valid quiddity term if a is a valid quiddity term and ¢ is in the vocabulary of 
metafunctions. 


4. (a) is a valid quiddity term if a is a valid quiddity term and ó is in the vocabulary of stuff 
attribute types. 


5. а• Вапа / В are valid quiddity terms if a and f are valid quiddity terms. 


6. Nothing else is a valid quiddity term. 


3. General Observations 
a.  Quiddity Component Definitions 

The stuff term must be correctly identified in order to accurately capture 
quiddity because all other quiddity components are built upon the stuff term. If this term is 
accurately defined, the other components are determined with relative ease. However, it is 
confusing and often difficult to correctly determine the stuff term. 

Recall the variable purchase cost which captures information about the cost of 
purchasing a Dodge truck in the month of July. Originally, we answered the questions as follows. 
What is this variable about? It's about a truck (stuff). In this case, the purchase cost of the 
truck is a function of month, therefore the arity argument is month. What sort of truck? A 
Dodge (stuff type) truck. What is it about the truck we are interested in? The cost (stuff 
attribute). What sort of cost is it? Purchase (stuff attribute type) cost. There is a metafunction 


of average. The quiddity representation is shown again in Figure 6, Example (a). 
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What would happen if we identified a different stuff term?!! What is this 
variable about? It's about purchasing a truck in July. Therefore, the stuff term is purchase 
and the arity argument is month. What sort of purchase is it? It is a truck (stuff type) 
purchase. What sort of truck? A Dodge (stuff type) truck. What 15 it about the purchase that 
we are interested in? The cost (stuff attribute). What sort of cost? We have no further 
information so there is no stuff attribute type. There is a metafunction of average. The quiddity 
representation for this set of questions is shown in Figure 6, Example (b). 

Although the examples provided are somewhat contrived, they do demonstrate 
that the categories of quiddity appear sufficient to capture the meaning of the variables’ data. 
Yet the method used to determine the component definitions is not structured enough to elicit the 


same answer from different people. 


Example (a): 


adu оо неон ССО Con TOCeENCK ROLE) )))) 


A 
METAFUNCTION — L- ARITY 
STUFF ATTRIBUTE uu STUFF 
STUFF TYPE 


STUFF ATTRIBUTE 


Example (b): 
average(cost(Dodge,truck(purchase(month)))) 


A A A A A A 
METAFUNCTION — | 
ARITY 
STUFF ATTRIBUTE 
STUFF 


Figure 6 Two Examples of Quiddity Representation for the Variable Purchase_Cost 





‘Tt is recognized that there would be a set vocabulary available for choosing these terms. 
However, a particular word or words can be applied in more than one quiddity category depending 
upon need, as shown in paragraph A.2.a.(2), where both truck and tire are stuff and stuff types. 
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b. Quiddity Equivalence Rules 

An important aspect of the authors’ premise, i.e., if two syntactically distinct 
variables have the same or equivalent dimension and quiddity, a possible unique name violation 
is indicated, is the notion of quiddity equivalence. What exactly constitutes quiddity equivalence? 
One issue addressed in the paper was whether the order of the quiddity components in the 
representation is important in establishing equivalence. For example, does it matter if the 
representation is stuff attribute type(stuff attribute(stuff type(stuff))) or stuff attribute(stuff attribute 
type(stuff type(stuff)))? No conclusion, one way or the other was presented. несе, the authors 
did state that this ambiguity could be reduced by stipulating validity conditions of quiddity 
expressions and introducing equivalence transforms. The quiddity validity conditions were 
discussed earlier. An example of an equivalence transform would be to state that stuff 
attribute(stuff) = stuff(stuff attribute). However, in the implementation, only the most straight 
forward pattern matching rule was used. 

There are other aspects of equivalence which were not specifically addressed. 
Does it make a difference which category a term falls within as long as the term is included in the 
quiddity expression? For example, in Figure 6, are the two quiddities depicted equivalent? The 
same words are in each description! Are terms which are synonyms equivalent, i.e., are cost and 


price equivalent? These issues are discussed further in following sections. 


B. QUIDDITY ACQUISITION 

To summarize, Bhargava et al. proposed that each model variable be further defined in 
terms of its dimension and quiddity. A UNV is indicated if, and only if, both the dimension and 
the quiddity of two variables are equivalent. If the dimensions are not equivalent, it follows that 
the variables do not represent the same information and no UNV should be detected. Similarly, 
if the quiddities are not equivalent, it again follows that the variables do not represent the same 


information and no UNV should be detected. Since this thesis focuses primarily on the feasibility 
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of quiddity as it applies to detecting naming problems during database integration, the dimension 
aspect will be ignored in further discussions. It would be unnecessary to examine equivalence of 
quiddities of two variables if the dimensions are not equivalent. Thus, when checking for quiddity 
equivalence between data elements in our experiments, we will assume that their corresponding 
dimensions are equivalent. 
l. Preliminary Experiment 

The purpose of the preliminary experiment was to provide initial data about the 
acquisition of quiddity. Additionally, this experiment was intended to eliminate "noise" in the 
primary experiment thus preventing interference with the analysis of the concept itself. In other 
words, we wanted to ensure that all subjects had a clear understanding of the concept before 
conducting the primary experiment. 

a. Subjects 

Six Naval Postgraduate students enrolled in the Computer Systems Management 

(CSM) Curriculum participated in the experiment. The students were randomly selected and had 
varying military backgrounds; Army (2), Navy (3), and Marine Corps (1). All had varying degrees 
of "computer expertise," from little or none when beginning the CSM Curriculum to having an 
undergraduate degree in Computer Science. All students had completed a course in the 
application of database management systems, so all had a common background in database 
technology. 

b. Design of Experiment 

(1) Goal. The goal of the experiment was to gather data concerning the 

formulation of quiddity for data elements. The intent was to apply any new insights gained here 


to the design and execution of the next (primary) experiment. 
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(2) Experiment Packet. ‘Two databases (overlapping in their real world 
domains), the Virus Database and the Hardware and Software Tracking System Database, 
designed by Naval Postgraduate students as class projects for a database management course, 
were used as the basis for the experiment. Twelve data elements from each database were 
selected for quiddity formulation. Care was taken to ensure that unique name violations did exist 
among the chosen data elements from each database. Each experiment packet contained the 
following: an overall information sheet, a work sheet (for practice and instructional purposes prior 
to beginning the experiment), a basic instruction sheet, a blank answer sheet, a general vocabulary 
list (words were not separated into quiddity component areas), a list of data dictionary entries 
pertaining to the selected data elements, and sample reports displaying the data captured by the 
selected data elements. A sample of this packet is contained in Appendix A. 

(3) Procedure. Prior to beginning the experiment, a general overview of the 
thesis objectives was presented to the students. Each student was given an experiment packet, 
three students associated with each of the two databases. Next, the students were asked to read 
the general information sheet which included the purpose, background, details concerning quiddity 
concept and definitions, and examples. Then, the students were provided with instruction on the 
concept as well as on the representation and rules for quiddity formulation. Additionally, a work 
sheet of sample quiddity problems was provided and discussed with the students. The students 
were allowed to ask questions in order to clarify the concept. 

Detailed instructions were provided to the students on the conduct of the 
actual experiment. Each student was asked to formulate quiddities for the twelve data elements 
provided using any and all information provided in the packet. They were asked not to discuss 
their answers with the other students nor to seek assistance from them. The students were not 
required to construct the quiddity expressions using the representation outlined in Section A of 


this chapter. We were more interested in the terms themselves. To avoid confusion, students 


were required to annotate quiddity terms using a table format. They were asked to provide 
comments pertaining to their "thought process” when developing the quiddities. Additionally, they 
were asked to comment on any areas of the concept which seemed difficult or confusing. It was 
suggested that they use only the vocabulary provided in the vocabulary list. If the vocabulary list 
did not contain a word which the student felt was crucial to forming the correct quiddity, they 
were instructed to add this word to the vocabulary and support its selection with a written 
justification. There was no set time limit for completion. Students were allowed to take the 
experiment packets with them and return them upon completion. This experiment was loosely 
controlled in order to gather as much raw input as possible. 
с. Results 

The goal of this experiment was to investigate several aspects of quiddity 
acquisition and formulation which led to the following questions. First, were the quiddities 
developed by the students correct? Second, did the students understand the concept and apply 
it correctly? Third, were the quiddities developed by students working with the same database 
identical? 

The experiment results’? were divided into two groups. The quiddities 
pertaining to the Virus Database were placed in Group 1 and the quiddities pertaining to the 
Hardware and Software Tracking System (HSTS) Database were placed in Group 2. There are 
a total of 36 quiddities in each Group, three for each of the twelve data elements. The correct 
quiddity!? of each data element was compared with the quiddities developed by the students. 


TABLE I shows summary statistics of the correctness of the quiddities in each Group. 


!? A]] experiment results are contained in Appendix A. 


13A master list of "correct" quiddities was developed prior to the experiment. 


29 


TABLE I QUIDDITY CORRECTNESS -- PRELIMINARY EXPERIMENT 


Correct Quiddity Matches 0/36 (0% 7/36 (19%) | 


Correct Stuff Matches 7/36 (19%) 24/36 (67%) 


Correct Stuff Attribute Matches 24/36 (67%) 14/36 (39% 
Stuff Attribute Matching 
Correct Stuff 5/36 (14%) 6/36 (17%) 
Stuff Matching Correct 
Stuff Attribute 0/36 (0%) 5/36 (14%) 


The results suggest that the students did not understand the concept so were not able to 





apply it correctly. Few quiddities were correctly defined, i.e., there were no matches?‘ between 
the correct quiddity and the experiment quiddities in Group 1, and only seven matches (out of a 
possible 36) in Group 2. Comparisons by quiddity component also showed some interesting trends. 
For the most part, the students were not able to correctly identify the stuff nor were they able 
to identify the stuff attribute. In fact, there were some instances where the students confused the 
stuff with the stuff attribute. Figure 7 shows specific instances of this confusion taken from the 


results. 


*“In order to be counted as an exact match, the experiment quiddities must be identical, term 
for term, to the "correct" quiddity. 
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Examples of data element stuff and stuff attribute confusion: 


BOOT-SECTOR (Group 1) VENDER (Group 2) 


Master indicator (damage) name (vender) 


VS. VS. 


Experiment damage (virus) vender (name) 


NOTE: Notation = stuff attribute(stuff) 





Figure 7 Stuff and Stuff Attribute Confusion 


The quiddity comparisons within each Group reflected the same difficulties 
previously noted. There were no exact matches!? between the three quiddities for each data 
element in either Group. Likewise, the definitions of the stuff and stuff attribute components were 


seldom in agreement. TABLE II shows statistics of the sameness of the quiddities in each Group. 


TABLE II QUIDDITY SAMENESS -- PRELIMINARY EXPERIMENT 


Exact Quiddity Matches 0/12 (0% 0/12 (0%) 


Exact Stuff Matches 4/12 (33%) 6/12 (50%) 


Exact Stuff Attribute Matches 2/12 (17%) 2/12 (17%) 





Student comments taken from discussions and written notes in the experiment 


packets also indicate confusion in applying the concept. The method for determining the stuff and 


15In order to be counted as an exact match, all three quiddities for the data element must be 
identical. Likewise, when counting exact matches between quiddity components, all three 
components for that data element must be identical. 
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stuff attribute were not structured enough. The distinction between the stuff and stuff attribute 
component was unclear (see example in Section A.3.a of this chapter). This led to the inversion 
of both terms. Another problem area centered around the level of detail of the terms. For 
example, should the stuff term be vehicle or truck (assuming both words are included in the 
vocabulary)? If vehicle is the correct stuff term, the term truck could be the stuff type. 
Conversely, if truck is the correct stuff term, the term vehicle is unnecessary, i.e., the term 
provides no additional meaning. Arity also caused a great deal of confusion. Most students 
seemed at a loss when it came to determining the arity of a stuff term. 
2. Refined Concept 

We experienced difficulties similar in nature to those indicated by the initial 
experiment results when developing the master quiddities for the experiment. Clearly, the 
quiddity acquisition process requires refinement. The chief problem areas center around the lack 
of clear distinction between the stuff and stuff attribute components. This uncertainty led to 
confusion in discerning the arity of the stuff component and in identifying the sortal information 
provided by the stuff type and the stuff attribute type. Additionally, the level of detail required 
(e.g., virus vs. software) is unclear. 

How can these problems be resolved? Clearly, a more descriptive definition of the 
quiddity components is needed. In other words, what is the meaning of each component and what 
kinds of information are each meant to supply? We propose that this concept can be clarified by 


examining quiddity from a linguistic perspective. 
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a. A Linguistic Perspective 

Linguistics! is the science of language. Linguists divide knowledge about 
language into four overlapping components: the lexicon,*’ phonology,?® syntax,’? and 
semantics. We are interested in the grammatical context of syntax and semantics as they 
apply to quiddity. The following discussion will draw a parallel between the structure of sentences 
and quiddity. 

(1) Sentence Structure. A sentence consists of a linear sequence of words, one 
following the other. This composition of words follows regular patterns, otherwise known as 
syntactic rules, or grammar. Word order is important in English because it is an "analytic 
language," which means that the relationships of words in a sentence are indicated by the order 
in which the appear (Barnett 1964, 29). 

The two essential parts of every sentence are an actor (subject) and an 
action (verb). Without these two parts, the meaning or semantics would be unclear. The normal 
order of these parts in a simple English sentence is subject/verb. The subject is what a sentence 
is about. The verb expresses what action the subject does. To find the verb in a sentence we 
often first find the subject. To find the subject, we ask the questions "Who or what is the sentence 


about?" or "Who or what is doing something in the sentence?" Then, we name the subject and ask 


From the American Heritage Dictionary, linguistics is "The study of the nature and structure 
of human speech." 


From the American Heritage Dictionary, lexicon is "The morphemes of a language." A 
morpheme is "A meaningful linguistic unit consisting of a word, such as man, or a word element, 
such as -ed of walked, that cannot be divided into smaller meaningful parts." 


1*From the American Heritage Dictionary, phonology is "The science of speech sounds, ...." 


гот the American Heritage Dictionary, syntax is (Gram.) "The way in which words are put 
together to form phrases and sentences." 


*°From the American Heritage Dictionary, semantics is "The study or science of meaning in 


language forms, ...." 
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the questions "Did what?" or "Does what?" to find the verb. All other words in the sentence 
radiate from this subject/verb core. (Osborn 1989, 15-20) 
The subject/verb core and all other words in a sentence can also be defined 
by the eight parts of speech (Osborn 1989, 67). They are: 
1. Noun: any of a class of words naming or denoting a person, place, or thing, idea, quality, 
etc. 


2. Verb: any of a class of words expressing action, existence, or occurrence; any phrase or 
construction used as a verb. 


3. Pronoun: a word used in the place of or as a substitute for a noun. 


4. Adjective: any of a class of words used to limit or qualify a noun or substantive (a word 
or group of words "subbing" as a noun). 


5. Adverb: any ofa class of words used to modify the meaning of a verb, adjective, or other 
adverb, in regard to time, place, manner, means, cause, degree, etc. 


6. Preposition: a relational word that connects a noun, pronoun, or noun phrase to another 
element of the sentence, such as a verb, a noun, or an adjective. 


7. Conjunction: a word used to connect words, phrases, clauses, or sentences. 
8. Interjection: Wow! Phew! A word expressing emotion or simple exclamation, thrown 
into a sentence without grammatical connection. 
These definitions will also be used in describing the structure of quiddity. 

(2) Quiddity Structure. Similar to a sentence, quiddity consists of a linear 
sequence of components (rather than words), one following the other. Within components, the 
terms (if there are more than one) are listed alphabetically (a syntactic rule). Unlike sentences, 
where the relationships of words are indicated by their order, the relationship of quiddity 
components are indicated by their definitions. Because we are dealing with a formal language, 
component order, once designated, will never change. Component order is therefore irrelevant to 
determining meaning. For example, consider the quiddity cost(truck). If the formal language has 


designated the component order to be stuff attribute(stuff), then we know with certainty that cost 
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| 


is the stuff attribute and truck is the stuff. What does this mean? We must know the relationship 
between the components (their definitions) to grasp the meaning of the sequence of components. 
If we know that stuff tells us what the data element is about and that the stuff attribute tells us 
what it is about the stuff we are interested in, then we can glean the quiddity's meaning. With 
this relationship defined, we now know that the data element captures information about the cost 
of a truck. Conversely, consider the quiddity truck(cost). Given that the formal language has 
designated the component order to be stu/ff(stuff attribute) and the definitions for stuff and stuff 
attribute stated earlier still apply, we will derive the same meaning. The data element with this 
quiddity also captures information about the cost of a truck. It is important to note that 
component order is irrelevant in providing meaning only so long as we know for a certainty which 
component is which. 

As in a sentence, quiddity has two essential components, stuff and stuff 
attribute, without which, there would be no meaning. All other quiddity components qualify the 
stuff and stuff attribute, just as all other words in a sentence qualify the subject and verb. 
Additionally, a parallel can be seen between the "questions" associated with the stuff/stuff attribute 


and subject/verb. (Figure 8) 


Description 


Stuff/stuff Attribute: Subject /Verb: 
* Stuff -- * Subject -- 
"What is it about?" "Who or what is the sentence 


about?" 


*Stuff Attribute -- * Verb -- 


"What is it about the stuff "(the subject) Did what?" 
are you interested in?" "(the subject) Does what?" 





Figure 8 Parallel Between Stuff/Stuff Attribute and Subject/Verb 
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b. | Linguistics Applied to Quiddity Acquisition 

It is our premise that certain aspects of sentence structure, when applied to 
quiddity, will yield a more descriptive definition of the stuff and stuff attribute components. The 
key to developing the "correct" quiddity lies in correctly identifying the stuff and stuff attribute 
components. These components form the core of meaning for quiddity, just as the subject and 
verb form the core of a sentence. As presently defined, the stuff component is comprised of only 
one value or term while the stuff attribute is comprised of one or more values or terms. This 
ambiguity can be reduced by restricting both components to one and only one value or word 
phrase per quiddity definition. This parallels sentence structure in that a simple sentence has one 
and only one subject and verb. 

Even though we have reduced the scope of values for both components, we are 
still faced with the lack of a clear distinction between them. The quiddity acquisition process 
requires that one first determine the value of the stuff component. Once the stuff is determined, 
the value of the stuff attribute can be captured. While developing the master list of quiddities for 
the initial experiment, we found that often the first value to become apparent was actually the 
stuff attribute. This led to confusion because the first inclination was to apply the value to the 


17 How can we 


stuff component. This, of course, will result in an incorrect quiddity definition. 
modify the method to allow for first determining the stuff attribute? The first step is to further 
define the stuff attribute. If we compare all the stuff attribute values for each of the data elements 
in the master quiddity list of the initial experiment, we find that the values have a common 


characteristic. Each stuff attribute is a type of MEASURE of the stuff component. For example, 


*7An example illustrating the reversal of the stuff and stuff attribute component was provided 
in Section A.3.a. of this chapter. 


30 


іп the quiddities name(software), cost(truck), and tail number(aircraft), each stuff attribute, (1е., 
name, cost, and tail number), is a measure of the stuff, (1.е., software, truck, and aircraft, 
respectively). 

To find this MEASURE, we first view the actual data contained in the field of 
the data element. Then we classify the data by grouping the collection under a general heading 
or name which answers the question "What is it?" or "What are these?" The aim is to categorize 
the actual words, codes, numbers, etc., that we see in the field. We are not concerned with what 
the data are representative of in the physical or concrete sense. We are looking for an abstract 
noun not a concrete noun.^^ The data in the field is a MEASURE of SOMETHING. The 
MEASURE is the stuff attribute and the SOMETHING is fie stuff. Consider the following 
examples. Suppose a list of the data corresponding to values in a data element field appears as 


in Figure 9, Example (a). 


Example (a): Example (b): 


(512.95) [sofa] 
(516.50) [chair] 


(518.75) [TV] 
[$26.75] [table] 
[$33.56] [desk] 





Figure 9 Examples of Data Element Values 


What values do we actually see in the field? We see a list of dollar amounts. What word can we 
use to categorize these amounts? Do they have a common characteristic? We can group them 
together and classify the amounts as prices or costs. Therefore, (choosing one of the terms) the 


stuff attribute is cost (assuming this term is included in the vocabulary list). Now that we have 


?? A concrete noun is the name of anything physical, anything that can be touched, seen, heard, 
smelled, or otherwise perceived by the senses and occupies space. An abstract noun is the name 
of a quality, state, or action. It is an idea, and so may not be touched, seen, heard, smelled, or 
otherwise perceived by the senses. (Osborn 1989, 19) 
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found the stuff attribute we can determine the stuff. The data in the field is the cost of 
something. The something is the stuff. It could be the cost of trucks, cars, ships, etc. 

Now consider the list of data shown in Example (b) in Figure 9. What do we see 
in this field? We see a list of furniture. What word can we use to categorize this list? We might 
be tempted to say that the category of this data is "furniture" but we would be wrong. Our aim 
is to capture a measure of the data, not what the data represents in the physical sense. What 
are these lists of "words" we see in the field? They are names. Therefore, the stuff attribute is 
name. We now have names of something. Names of what? Furniture. So, the stuff attribute 
is name and the stuff is furniture. Of course, we will also have the data dictionary, a vocabulary, 
and any other available information to aid in defining these components. 

We can now apply the "questions" in the original method to these examples as 
a verification of our answers. The two procedures will complement each other. If the values 
chosen for the stuff and stuff attribute components comply with both methods, the chances of 
incorrectly defining either component will be minimal. Figure 10 steps through the original 


method supporting our selection of values in the examples above. 


Original Method 


Description: , This variable captures information 
about the cost of a truck. 


What is it about? A truck (stuff). 


What is it about the truck are we interested in? 
Its cost. 


Description: This variable identifies a specific 
piece of furniture. 


What is about? Furniture (stuff). 


What is it about the furniture are we interested in? 
The name (stuff attribute). 


Figure 10 Quiddity Acquisition -- Samples of Original Method 
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Additionally, the remaining quiddity components, stuff type, arity argument(s), 
and stuff attribute type, as well as stuff and stuff attribute, can be likened to various parts of 
speech.** Both the stuff and stuff attribute values are usually indicated with nouns. However, 
a subtle difference is that the stuff attribute will generally be represented by an abstract noun 
while the stuff is represented by a concrete noun. Stuff types, which qualify stuff, and stuff 
attribute types, which qualify stuff attributes, are indicated by adjectives or adverbs. Both stuff 
types and stuff attributes may have more than one value in a quiddity definition. This occurs when 
there are more than one qualifying terms for the stuff or stuff attribute as shown in the example 
depicting the quiddity of an unmanned fighter aircraft.2* Both unmanned and fighter further 
describe the stuff term aircraft and are adjectives. There may be instances where a term is 
needed to further describe a stuff type or stuff attribute type term. If this occurs, the term, 
generally an adverb, is annotated as an additional stuff type or stuff attribute type term (as 
appropriate). When a stuff term has arity, its argument(s) will typically be represented by a 
noun(s). 

We have suggested several changes in the quiddity acquisition process based on 
a linguistic approach. The refined concept for quiddity acquisition presented above is summarized 
below. 

1. Gather Information. Examine the definition of the data element using the data dictionary 
and any other available information. 


2. Examine Data. Examine a collection or list of actual data values contained in the data 
element field. 


3. Classify Data. Classify the data by grouping the collection under a general heading or 
name which answers the question "What is it" or "What are these?" Each piece of data is 
an instance of the same thing or quality of something. The data is a type of MEASURE 
of something. This MEASURE is the stuff attribute. 


?*See Chapter III, Section B.2.a.(1) for a list of the eight parts of speech. 


24See Chapter III, Section A.2.b.(1). 
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4. Find Stuff Attribute and Stuff. The stuff attribute measures something. The something 
is the stuff. The stuff will generally be a noun which is the object of a prepositional 
phrase associated with the stuff attribute. For example, if cost is the stuff attribute, 
answering the question "Cost of what?" will lead to the stuff term. The what is the stuff 
term. The what is also the object of the prepositional phrase of what associated with the 
stuff attribute term cost. 





5. Verify terms. Verify the stuff and stuff attribute terms by referring to the "questions" in 
the original method. What is the data element about? The stuff. What is it about the 
stuff we are interested in? The stuff attribute. If the terms also satisfy these questions, 
continue defining the remaining quiddity components as described in the original method. 
If not, return to the first step and begin again. 

6. Define Remaining Components. Answer the questions "What sort of stuff is it? The stuff 
type. "What sort of stuff attribute is it?” The stuff attribute type. Is the stuff term a 
function of something else? If yes, determine the argument(s). 

The refined concept presented in this section addressed the problem of a 
lack of clear distinction between the stuff and stuff attribute components in the original concept. 
We have yet to address the remaining problem areas identified in the preliminary experiment, 
namely, the confusion in discerning the arity of stuff, the sortal information provided by the stuff 
type and stuff attribute type, and the level of detail required when defining components. These 


issues will be discussed in Chapter IV based on data obtained in the primary experiment. 


C. QUIDDITY MANIPULATION AND INFERENCING 
Our aim in this section is to present rules for determining quiddity equivalence. Based on 
these rules, we present several quiddity comparison procedures, both of which are necessary in 
examining the feasibility of quiddity in support of automatic detection of possible naming problems. 
1. Rules for Quiddity Equivalence 
Recall Bhargava et al.'s (1990) premise, that "tf two syntactically distinct variables 
have the same or equivalent dimension and quiddity, a possible unique names violation is 
indicated. (Recall also that when checking for quiddity equivalence between data elements, we will 
assume that their corresponding dimensions are equivalent.) The rules for determining quiddity 


equivalence (presented below) are based on the following hypotheses: 


H1. Stuff and stuff attribute are the most crucial quiddity components. 
H2. Some use more specific terms than others when defining quiddities, e. g., vehicle : truck. 


H3. People developing quiddities are likely to confuse the values defining stuff type with the 
values defining arity?” (arguments). 


H4. Some define quiddities more extensively than others, e. g., two values for stuff type vs. one 
value for stuff type. 
a. Term Equivalence 

What constitutes quiddity equivalence? An obvious answer is that quiddities are 

equivalent when they are syntactically identical, term for term. In other words, quiddities are 
equivalent when all quiddity components are equivalent. When are quiddity components 
equivalent? Again, an obvious answer is that the components are equivalent when the terms or 
values in the components are equivalent. We now reach the core of the equivalence process. 
When are terms equivalent? Obviously, terms are equivalent when they are syntactically identical, 
e.g., cost is equivalent to cost. However, suppose the terms being compared for equivalence are 
price and cost. Are these terms equivalent? The words are synonyms and, as such, their meanings 
are equivalent. Another aspect of equivalence appears when we compare the terms vehicle and 
truck. Are these terms equivalent? A truck is a vehicle. One term is simply more specific than 
the other. Vehicle could refer to a truck, but it could also refer to a bus. If we say that these 
terms are equivalent when they are really different (e.g., vehicle means bus), we run the risk of 
identifying a possible naming problem when it does not exist (Type I error). However, if we say 


the terms are not equivalent when they really are (e.g., vehicle means truck) and do not identify 





a possible naming problem, we run the risk of not identifying a problem when there really is one 


(Type II error). Since we are attempting to detect possible naming problems, we need to minimize 


?*Here, and in subsequent references, we are using the word arity to denote the arguments 
which are often needed to further describe stuff. 
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all errors, but specifically, Type II errors. We do not want to miss detecting a possible naming 
problem. Our premise is that in order to minimize Type II errors, we need the following three 


basic rules for term equivalence: 


1. Two terms are equivalent if they are identical, i.e., match syntactically. 

2. "Two terms are equivalent if they are synonyms. 

3. Two terms are equivalent if one term is a specialization of the other in the sense that all 
objects in the class represented by that term are also present in the class of objects, e. g., 
truck : vehicle (from H2). 

We can now use these three term equivalence rules, singly or in combination, 
when determining component equivalence. We have divided the quiddity components into two sets 
for comparison. One set, designated the Stuff Set, contains the components stuff, arity, and stuff 
type. The other set, designated the Stuff Attribute Set, contains the components stuff attribute 
and stuff attribute type. Our premise is that quiddities are equivalent if and only if their 
corresponding Stuff Sets and Stuff Attribute Sets are equivalent (from H1). The following 
sections present equivalence rules for each set. 

b. Stuff Set Equivalence 

Stuff Set equivalence is defined in terms of equivalence of its components. The 
most evident and restrictive equivalence rule, alternative 1, is to require all components within the 
set to be equivalent (based on the term equivalence rules above) in order to have Stuff Set 
equivalence. Based upon the problem areas noted in the preliminary experiment, namely, the 
confusion in discerning the arity of stuff, the sortal information provided by the stuff type, and the 
level of detail required when defining components, this rule is too restrictive and would most likely 
result in a high number of Type II errors (H2 and H4 apply here). For example, the problem 
areas noted above will cause quiddities to be developed inconsistently. Even though the Stuff Sets 


of two data elements should be equal (because the data elements actually represent the same 
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information), the inconsistency in defining the components would cause this rule to fail. Our 
premise is that we can alter this rule in order to compensate for these inconsistencies (in lieu of 
further refining the quiddity acquisition process). ens der the following example. The data 
element purchase_cost in Database A with the quiddity purchase(cost(Dodge,used(truck(month)))) 
is compared to the data element truck cost in Database B with the  quiddity 
purchase(cost(used(truck(month)))). The quiddity of the data element truck cost is certainly less 
specific than that of purchase cost. Does this mean that the data elements do not actually 
represent the same information? We can not be sure without further examination, so we would 
want these data elements flagged as a possible naming violation. For that to happen, their 
quiddities (and thus their Stuff Sets) must be determined to be equivalent. 

Our rule still states that Stuff Sets are equivalent if their three components are 
equivalent. However, now the rules for arity and stuff type equivalence must be altered to the 
following. (The stuff components are determined to be equivalent based upon the term 
equivalence rules.) Alternative 2 is that the arity of two data elements is equivalent if the arity 
arguments of one data element are contained in the arity arguments of the other data element. 
Similarly, the stuff type components are equivalent if the stuff type terms of one data element are 
contained in the stuff type terms of the other data element. For example, the stuff type term used 
(belonging to the data element truck_cost) is contained in the set of stuff type terms Dodge and 
used (belonging to the data element purchase-cost). Therefore, the stuff type components of the 
data elements are equivalent. It should be noted that term, is contained in a set of terms if 
term, is equivalent to a term in the set based upon the term equivalence rules listed above. 
(Additionally, an empty set is contained in any set.) 

We can be even less restrictive in determining equivalence by combining the arity 
arguments and stuff type terms of the Stuff Set into one set and comparing this set for 
equivalence. Alternative 3 is that the Stuff Sets are equivalent if the stuff components of the 


Stuff Sets are equivalent (based on the term equivalence rules above) and the set of arity 
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arguments and stuff type terms of one Stuff Set is contained in the set of arity arguments and 
stuff type terms of the other Stuff Set. Taking this one step further by combining the terms of 
all of the Stuff Set components into one set, alternative 4 is that the Stuff Sets are equivalent if 
the terms in the combined set of one Stuff Set are contained in the set of combined terms of the 
other Stuff Set. 
c. Stuff Attribute Set Equivalence 
The rationale presented above also applies to Stuff Attribute Set equivalence. 
Again, the most evident equivalence rule, alternative 1, is to require all йе within the set 
to be equivalent (based on the term equivalence rules above) in order to have Stuff Attribute Set 
equivalence. Our hypothesis, that confusion between components and differing levels of 
description detail in quiddity acquisition can be compensated for by altering the equivalence rules, 
applies here as well. Alternative 2 is that the Stuff Attribute Set is equivalent if the stuff 
attributes are equivalent (based on the term equivalence rules defined earlier) and the stuff 
attribute types are equivalent. Stuff attribute type components are equivalent if the stuff attribute 
type terms of one data element are contained in the stuff attribute type terms of the other data 
element. Further, by combining the terms of all of the Stuff Attribute Set components into one 
set, alternative 3 is that the Stuff Attribute Sets are equivalent if the terms in the combined set 
of one Stuff Attribute Set are contained in the set of combined terms of the other Stuff Attribute 
Set. 
2.  Quiddity Comparison Procedures 
In the preceding section, we defined several sets of quiddity equivalence rules. Various 
comparison procedures can be defined by applying these rules in different combinations. For 
clarity in discussion, these rules are depicted in Figure 11 in an abbreviated notation and are 


divided into sets of numbered rules. 
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EQUIVALENCE RULES: 
Set A - Term Equivalence 


( = — “is equivalent to”) 


Term, з Term, if rule 1, 2, or 3 is true. 


1. Syntactic: 
Term, = Term, if they match syntactically. 


2. Synonym: 
Term, = Term. if rule 1 is true OR if 


Térn, and Term, are synonyms. 


2 
3. Network: 
Term, = Term. if rule 1 is true OR if 
rute 2 is frue OR if 
Term, and Term, are within the same 
claldsificatiof network. 


Set B - Stuff Set Equivalence 


Stuff Set, = Stuff бес, if rule 1, 2, 3, or 4 is true. 


1. Plain and Simple: 
Stuff Set, = Stuff Set. if 


1 2 
stuff, = stuff 
arity, = arity 

stuff_type; = stuff type, 


2. Partial is-contained-in: 
( a +< > — "а is contained in b or b is contained in a") 
Stuff Set, = Stuff Set, if 
stuff, є всиїї. and 
агісу, +/+ arity, and 
stuff type; s/e stuff type, 


3. Part/Full is-contained-in: 
Stuff Set, = Stuff Set, 1f 
stuff. = stuff and 
(arity + stuff tjpe), +/+ (arity + stuff type), 


4. Full is-contained-in: 


Stuff Set, = Stuff Set, 


(stuff + arity + stuff_type), +/+ (stuff + arity + stuff type). 


if 


Set C - Stuff Attribute Set 


Stuff Attribute Set, = Stuff Attribute Set 


if rule 1, 2; or 3, is true. 2 


1. Plain апа Simple: 


stuff attribute m stuff attribute, and 
stuff attribute type, z stuff attribute type, 


2. Partial is-contained-in 
stuff attribute, z stuff attribute, and 
stuff attribute type: s/e stuff attribute type, 
3. Full is-contained-in 


(stuff attribute + etuff_attribute_type), +/+ 
(stuff attribute + stuff_attribute_type), 


Figure 11 Quiddity Equivalence Rules 
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a. Term Equivalence Rule Set 
We defined three basic term equivalence rules in the previous section. These 
basic rules are applied in three distinct combinations. Rule Al (Set A, Rule 1) states that two 
terms are equivalent if they are syntactically identical, or in other words, they are a syntactic 
match. Rule A2 states that two terms are equivalent if they are synonyms or if they are a 
syntactic match. Rule A3 states that two terms are equivalent if they are related, as in a 
classification network, if they are synonyms, or if they are a syntactic match. Figure 12 illustrates 


the organization of terms into a classification network. 


The classification network below depicts relationships 
between terms. For example, a professor 1s a person, a 
Manager is a person, and a student is a person. 


(a) person (b) name 


(is a) (is a) 


— 


professor manager student title surname 





Figure 12 Classification Network 


b. Stuff Set Equivalence Rule Set 
We defined four Stuff Set equivalence rules as shown in Set B of Figure 11. We 
chose to maintain a strict equivalence rule in most combinations for the stuff component due to 
its significance in the quiddity definition (from H1). Set B component equivalency rules are based 
on the rules in Set A, e.g., components are equivalent if their terms are equivalent. Rule B1 states 
that Stuff Sets are equivalent if each of their components are equivalent. Rule B2 states that two 
Stuff Sets are equivalent if their stuff components are equivalent, the arity arguments of one are 


contained in the arity arguments of the other, and the stuff type terms of one are contained in the 
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stuff type terms of the other.?* Rule B3 states that two Stuff Sets are equivalent if their stuff 
components are equivalent and the combined set of arity and stuff type terms of one are contained 
in the combined set of arity and stuff type terms of the other. Rule B4 states that two Stuff Sets 
are equivalent if the combined set of stuff, arity, and stuff type terms of one are contained in the 
combined set of stuff, arity, and stuff type terms of the other. The stricture of these rules can be 
reduced slightly by varying the term equivalence rules. 
c. Stuff Attribute Equivalence Set 

We defined three Stuff Attribute Set equivalence rules as shown in Set C of 
Figure 11. We chose to maintain a strict equivalence rule in most combinations for the stuff 
attribute component due to its significance in the quiddity definition (from H1). Set C component 
equivalency rules are also based on the rules in Set A. Rule C1 states that Stuff Attribute Sets 
are equivalent if each of their components are equivalent. Rule C2 states that two Stuff Attribute 
Sets are equivalent if their stuff attribute components are equivalent and if the stuff attribute type 
terms of one are contained in the stuff attribute type terms of the other. Rule C3 states that two 
Stuff Attribute Sets are equivalent if the combined set of stuff attribute and stuff attribute type 
terms of one are contained in the combined set of stuff attribute and stuff type terms of the other. 
Again, the stricture of these rules can be reduced slightly by varying the term equivalence rules. 

d. Procedures 

The three sets of rules can be combined into twelve distinct procedures. This 
is best shown in a matrix format. (See TABLE III) For each procedure, there must be one term 
equivalence rule, one Stuff Set equivalence rule, and one Stuff Attribute Set equivalence rule. 


In the matrix, both Rule B2 and Rule B3 (Stuff Set) are combined with Rule C3 (Stuff Attribute 


2916 should be noted that term, is contained in a set of terms if term, is equivalent to a term 
in the set based upon the term equivalence rules. Additionally, an empty set is contained in any 
set. 
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Set) in the separate procedures because Rules B2 and B3 more closely match in equivalence 
concepts with Rule C2 than they do with Rule C3. The idea is maintain equivalence consistency 
between the Stuff Set and the Stuff Attribute Set. For example, it is not consistent to apply the 
most strict rule of equivalence to the Stuff Set (e.g., Rule B1) while at the same time applying the 
loosest equivalence rule to the Stuff Attribute Set (e.g., Rule C3) in determining quiddity 


equivalence. 


TABLE III EQUIVALENCE PROCEDURES 


sets | бес с 
, 
7 


| л1,в3,с2 | аз,в3,с2 |Аз,в3,с2 
[A1,54,c3 | a2, B4, c3 |a3,B4,c3 





These twelve procedures were applied to, and tested using a prototype 
application developed in Prolog*’. A given procedure is specified simply by specifying the 
appropriate rules within each set. These procedures are examined in greater detail in Chapter 


IV. 


?"The prototype program listing, along with the data listing, can be found in Appendix B. 
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IV. PRIMARY EXPERIMENT 


A. DESIGN OF EXPERIMENT 
1. Subjects 
The same six Naval Postgraduate students who participated in the preliminary 
experiment also participated in the primary experiment. These students were selected in order 
to take advantage of their experience in quiddity formulation. The intent of this selection was to 
eliminate any "noise" in the experiment (due to not understanding the concept) which could 
interfere with the analysis of the concept itself. 
2. Goal 
The goal of this experiment was to gather data concerning the formulation of quiddity 
for data elements using the refined concept described in Chapter III, Section B.2. These quiddities 
would then be compared using the procedures developed in Chapter III, Section C, to determine 
if the concepts were equivalently applied by the subjects and if the quiddities could be useful in 
support of automatic detection of unique name violations. 
3. Experiment Packet 
Two databases (overlapping in their real world domains), the Naval Postgraduate 
School Automated Catalog (NAC) and the Course Requirements and Forecasting Tool (CRAFT), 
designed by Naval Postgraduate students as class projects for a database management course, 
were used as the basis for this experiment. Fifteen data elements from each database were 
selected for quiddity formulation. Care was taken to ensure that unique name violations did exist 
among the chosen data elements from each database. Each experiment packet contained the 
following: all information included in the preliminary experiment packet to include students’ 


original responses with the addition of "suggested quiddity answers," a new information sheet , an 
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updated work sheet with answers to examples provided in the preliminary experiment, a basic 
instruction sheet, a blank answer sheet, a vocabulary list (words were separated into quiddity 
component areas), a list of data dictionary entries pertaining to the selected data elements, and 
sample reports displaying the data captured by the selected data elements. An example of this 
packet is contained in Appendix C. 

4. | Procedure 

The procedure for this experiment was similar to the preliminary experiment in most 
respects. Prior to beginning the experiment, a general overview of the thesis Objectives was again 
presented to the students. Each student was given an experiment packet, and were associated 
with each of the two databases. Group integrity remained constant from the preliminary 
experiment when assigning students to a database. Next, the students were asked to read the 
new information sheet which included the purpose, a review of details concerning quiddity concept 
and definitions, and a new approach to be used in addition to the original concept. Then, the 
students were provided with instruction on the new approach as well as a review of the original 
concept. Additionally, an updated work sheet with answers to the sample quiddity problems used 
during preliminary experiment instruction was provided and discussed with the students. The 
students were allowed to ask questions in order to clarify the concept. Responses to the 
preliminary experiment were discussed and further instruction was provided on the concept of 
arity. 

Conduct of the experiment was closely matched to that of the preliminary 
experiment?? with the following exceptions. Each student was asked to formulate quiddities for 
fifteen data elements (as opposed to twelve data elements in the preliminary experiment) and to 
provide comments on the usefulness of the refined approach in quiddity formulation. Students 


were also asked to comment on any areas of the concept which remained difficult or confusing. 


285ее Chapter III, Section B.1.b.(4) 


Unlike the preliminary experiment, the students were restricted to using only the vocabulary 
provided in the vocabulary list. If the vocabulary list did not contain a word which the students 
felt was crucial to forming the correct quiddity, they were instructed to provide comments to that 
effect but to complete all quiddities to the best of their ability using only the vocabulary in the list. 
The vocabulary was restricted in order to increase control over the experiment, thus more 


effectively testing the concept. 


В. EXPERIMENT RESULTS 

The goal of this experiment was to investigate several aspects of quiddity acquisition and 
formulation, with emphasis on the problems noted in the preliminary experiment, and to test the 
hypothesis noted in Chapter III, Section C. Specific areas of interest are highlighted by the 
following questions. Did the refined concept improve the distinction between the stuff and stuff 
attribute components, i.e., were their values still subject to inversion? Was the idea of arity 
understood and correctly applied? Did the equivalence procedures compensate for any of the 
problems noted in the preliminary experiment? 

1.  Quiddity Formulation 

The experiment results’? were divided into two groups. The quiddities pertaining 

to The Naval Postgraduate School Automated Catalog (NAC) Database were placed in Group 1 
and the quiddities pertaining to the Course Requirements and Forecasting Tool (CRAFT) Database 
were placed in Group 2. There are a total of 45 quiddities in each Group, three for each of the 
fifteen data elements. The correct quiddity®° of each data element was compared with the 
quiddities developed by the students. TABLE IV shows summary statistics of the correctness of 


the quiddities in each Group. 


?? All experiment results are contained in Appendix C. 


39A master list of "correct" quiddities was developed prior to the experiment. 
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TABLE IV QUIDDITY CORRECTNESS -- PRIMARY EXPERIMENT 
w 


Correct Quiddity Matches 











13/45 (29% 15/45 (33% 
Correct stuff Matches 


39/45 (87%) 35/45 (78%) 
Correct stuff Attribute Matches 25/45 (56% 27/45 (60% 
Stuff Attribute Matching 
Correct stuff 0/45 (0% 1/45 (2%) 
Stuff Matching Correct 


| 
Stuff Attribute 0/45 (0$) 4/45 (9%) 


The results suggest that the students have a much better understanding of the 











concept. There was a significant increase in the number of correctly defined quiddities in both 
Groups, more than double the percentage correctly defined in the first experiment.?! 
Comparisons by quiddity component (stuff and stuff attribute) also improved greatly. Based upon 
comments from the students, this overall improvement can be attributed to several factors. First, 
all students indicated that the refined concept simplified and added clarity to the quiddity 
acquisition process. Two students stated that they used only the refined concept in determining 
the quiddity definitions, ie. they did not use the original concept to verify their definitions. 
Second, all students reported that the restrictive vocabulary reduced the uncertainty in defining 
the quiddities. "Third, all students related that their familiarity with the concept eased the task 
of defining the quiddities in this experiment. 

The quiddity comparisons within each Group improved overall. There were still very 
few exact matches between the three quiddities for each data element in either Group. For the 


?!In order to be counted as an exact match, the experiment quiddities for the data elements 
must be identical, term for term, to the "correct" quiddity. 
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most part, the reason the quiddities were not exact matches was due to differences within stuff 
type and stuff attribute type. This reflects uncertainty in the level of detail required in defining 
quiddity and supports our fourth hypothesis (see Chapter III, Section C). Some students 
demonstrated a tendency to be consistently more specific than others. The number of exact 
matches within the stuff and stuff attribute components increased significantly from the first 
experiment. These improvements can also be attributed to the same factors discussed in 
connection with quiddity correctness. TABLE V shows summary statistics of the sameness of the 


quiddities developed within each Group.?? 


TABLE V QUIDDITY SAMENESS WITHIN GROUPS -- PRIMARY EXPERIMENT 


is amm is. 
ае аа | 


Arity continues to cause a great deal of confusion. Students remain at a loss when it 












6/15 (40%) 6/15 (40%) 








comes to determining the arity of a stuff term. In both Groups, arity was correctly identified by 
only one student. It should be noted that the data pertaining to arity can be misleading. There 
are only three data elements in the experiment (one in the NAC database and two in the CRAFT 
database) which have an arity greater than 0 and require defining. Most students indicated that 
they left the arity component blank because they were not certain if the stuff component had arity 
greater than 0. This resulted in an arity "correctness" of 73% for Group 1 and 80% for Group 2 


because 27 of the 30 data elements in the experiment have arity of 0! 


32А match here means that all three subjects in the same group used the exact same term(s). 
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2. Procedures For Quiddity Comparison 
The data in the experiment was compared for equivalence using the twelve procedures 
described in Chapter III, Section B. TABLE VI (taken from Chapter III, Section C) provides an 
overview of the rule combination for each procedure. The set designation is now indicated by the 
position of the rule number. For example, a procedure number now consists of just three 
numbers, i.e., 243. The number in the first position (2) indicates rule number 2 from Set A. The 
number in the second position (4) indicates rule number 4 from Set B. The last number (3) 


indicates rule number 3 in Set C. 


TABLE VI EQUIVALENCE PROCEDURES 


Set B Set C 










The experiment data consists of six sets of quiddities, three from the NAC Database 
and three from the CRAFT Database. Our experiment assumes that we are planning to integrate 
the two databases. Our goal is to detect possible naming problems (synonyms and homonyms) by 
comparing the quiddities for each database using the above procedures. There are a total of nine 
unique pair-wise combinations of quiddities (each of the three sets of CRAFT quiddities compared 
with each of the three sets of NAC quiddities). For each combination, there are 225 comparisons 
of data elements (15 x 15). The quiddities within each database were also compared with each 
other in order to provide data pertaining to the "sameness" of the quiddities. Finally, the master 
quiddity list for each database were compared. 

There were 192 database comparisons performed ( (9 x 12) + (6x 12) + 12) witha 


grand total of 42,525 comparisons (when counting each data element comparison). The prototype 
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implementation produces a listing for each database comparison. The report lists pairs of data 
elements which may have naming violations. These pairs of data elements are categorized as 
possible homonyms or synonyms. Sample output listings are in Appendix C. These comparisons 
(225 for each procedure plus 12 for the master quiddity comparison) were subdivided by procedure 
and type (e.g., between databases, within database, and master) and analyzed to determine the 
number of Type I and Type II errors. The raw data is compiled in TABLES located in Appendix 


C. An analysis of this data is presented in the next section. 


C. ANALYSIS OF COMPARISON PROCEDURES 

The objective of this section is to determine the combination of equivalence rules which will 
minimize Type I and Type II errors (with priority on Type II errors). There are a total of five 
synonyms and three homonyms in this experiment. 

1. Synonyms 

The number of Type II errors decreased or remained constant as the term equivalence 

rules became lax.?? The broader definition of equivalence increased the chances of correctly 
identifying all naming violations. As the component equivalence rules were broadened, the 
Type II errors decreased, but not at a very significant rate. (However, notice that there are only 
o true synonym problems.) Conversely, as the term equivalence rules became lax and the 
component equivalence rules broadened, Type I errors increased. Clearly, it is more important 
to prevent Type II errors, than it is to avoid increasing Type I errors. However, the results do 


indicate that the "middle of the road" procedures are best, i.e., those procedures using component 


33A procedure is more "lax" than another procedure if the following rule is true. 
Given Procedure, (i J K), Procedure, (i j,,k,), and "more lax" —— "< <": 
Proc, < < Proc, if 


i, < i, and 
j, < ), and 
k, < k, (where at least one is a strict inequality) 
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set rules 22 or 32. The "best" procedure is the one with the lowest error rate (both Type I and 
II errors). By providing weights to each type of error, we can choose the procedure with the 
lowest error rate. Even though the experiment quiddities were, on the average, only 31% 
correct, the trends in numbers of Type I and II errors very closely paralleled those of the master 
list. This seems to indicate that there may not be just "one" correct quiddity for a data element. 
Additionally, there is no difference between component equivalence rules 22 and 32. This seems 
to indicate that either arity is irrelevant or that the results are skewed. (The fact that 27 of the 
30 data elements have no arity could skew these results.) (See Figures 13, 14, 15, and 16) 
2.  Homonyms 

Since this thesis focused primarily on the synonym problem, homonyms will only be 
addressed briefly. Homonyms appear to be a much simpler problem to detect than do synonyms 
because it is necessary to compare quiddities only when two data element names are syntactically 
identical. However, the same methods apply once the identical names are detected. 

The number of Type II errors increased as the term equivalence rules became lax. 
The broader definition of equivalence increased the chances of failing to identify all naming 
violations. As the component equivalence rules were broadened, the Type IJ errors increased 
significantly. Type I errors were nonexistent throughout. To identify a Type I error, the 
procedure would have to incorrectly determine that equivalent quiddities are not equivalent while 
at the same time detecting identical data element names. This circumstance appears to be a rare 
occurrence. All our experimental results point to the conclusion that the best procedure for 


detecting homonyms is the one which is the most strict, i.e., not lax. (See Figures 17 and 18) 


?^Given the total number of Type I errors, N,, and the total number of Type II errors, N,,, 
and weights, W, and W,,, respectively, then the error rate for the procedure is: 
error-rate( N,, N,. ) 
= W,-N, + W,,°N,, (W, will normally be less than W,,) 
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Figure 13 Synonyms -- Type II Errors (Experiment) 
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Figure 14 Synonyms -- Type II Errors (Master) 
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Synonyms —- Type | Errors 
By Term Equivalence Rule 
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Figure 15 Synonyms -- Type I Errors (Experiment) 
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Figure 16 Synonyms -- Type I Errors (Master) 
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Homonyms —— Type ll Errors 
By Term Equivalence Rule 
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Figure 17 Homonyms -- Type II Errors (Experiment) 
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Figure 18 Homonyms -- Type II Errors (Master) 
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V. CONCLUSION 


A. CONTRIBUTIONS AND LIMITATIONS 

This thesis has examined and enhanced a method for automatically detecting possible 
naming problems of data elements prior to database integration. We explored several aspects of 
quiddity, namely, quiddity concept definition, quiddity acquisition, and quiddity manipulation and 
inferencing procedures. Specifically, we administered the first real, experimental application of 
the concept of quiddities. With a careful analysis and extensive examples, the concept was refined 
and adapted to the database environment. Our research also constitutes the first important study 
on quiddity acquisition. We investigated how the issues of vocabulary, synonyms, classification 
properties, and degrees of specificity affect quiddity acquisition. Finally, we developed, 
implemented, and tested a number of alternative inference procedures, along with equivalence 
rules, for use in automatically detecting possible naming problems. 

Our research indicates that the concept of quiddity can be applied in the database context 
to provide a basis for automatically detecting unique name violations. Experiment results show 
that two individuals seldom consistently develop syntactically identical quiddities for the same data 
elements. However, we found that by varying the rules for equivalence, these differences in 
defining quiddities could be compensated for, ultimately resulting in equivalent quiddities (as they 
were initially supposed to be). The use of a specific vocabulary, coupled with the use of synonyms 
significantly countered this problem of inconsistency. Conversely, the use of classification 
properties tended to exacerbate this problem. However, the size and number of databases limits 


the scope of our conclusions. Additionally, our experiments were not fully controlled. This fact 
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aided our efforts in gathering as much data as possible, but limits our ability to advance any firm, 
fully supported conclusions. Our research does provide an indication of the direction in which full, 
formal testing should follow. 

We also presented several inference procedures for evaluating quiddity equivalence. Initial 
results indicate that the best procedure for detecting synonyms is one that lies (approximately half 
way) between those procedures with the most strict and the most lax equivalence rules. On the 
other hand, indications are that homonyms are best found utilizing a procedure with very strict 
equivalence rules. 

The concept of quiddity is a complex issue. Correct and consistent application of this 
concept depends upon a clear and unambiguous understanding of each of the components 
comprising quiddity. Clearly identifying each component with more descriptive names would 
facilitate comprehension of the concept. For example, the word "quiddity" succinctly and 
appropriately describes the semantic information being captured. However, the words "stuff" and 
"stuff attribute" are vague, unclear descriptions of the quiddity components. Therefore, we 
propose the following name changes in future applications of this concept. 

1. Stuff. Stuff describes what the data element is about. All other quiddity components 
revolve around this description as it is the heart of the quiddity definition. A more 
appropriate and descriptive title is "gravamen." From Roget’s II, The New Thesaurus, 
gravamen is "The most central and material part." 

2. Stuff Attribute. Stuff attribute is a measure of the stuff component. Based on the 
name suggested for the stuff component, a fitting and more specific title is 


"gravamen measure." 


3. Stuff Type. Stuff type further describes stuff. Following the recommendations above, 
a more suitable title would be "gravamen type" or "gravamen qualifier." 


4. Stuff Attribute Type. Stuff attribute type further describes stuff attribute. Similarly, 
a pertinent title is "gravamen measure type" or "gravamen measure qualifier." 
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B. ISSUES FOR FURTHER RESEARCH 

There are several issues for consideration in further research. Formal testing of the 
processes in quiddity acquisition is needed as the model has yet to be validated. The quiddity 
inferencing procedures should be further developed and tested on a more extensive database. 
Additionally, the prototype can be refined and improved to increase efficiency. More in depth 
analysis of the linguistic aspects is feasible. Could it lead to a theory? Development of an 
interactive system to support quiddity declarations would aid in quiddity acquisition. For example, 
the system would check validity of the quiddity definitions and provide alternatives (e.g., if 
dimension = currency, then the stuff attribute = cost, price, value ... ). Finally, can the concept 
of quiddity be helpful in identifying different representation conflicts, in addition to naming 
conflicts? In summary, the concept of quiddity, in addition to demonstrating usefulness in 
detecting naming problems in database integration, may also be useful in detecting or resolving 


other conflict areas in database integration. 
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APPENDIX A -- PRELIMINARY EXPERIMENT 


This Appendix contains samples of the contents of the packet given to the students during 
the preliminary experiment. Additionally, a TABLE with each students’ experiment results 
(quiddity definitions) along with the master quiddities for this experiment is included. Items 


specified above are found on the following pages: 


íos ion Sheetmm ......... mmm. . Wm ee eee. 58 
КИ песнике... „ИК, .. 5. И. И г... 62 
Instruction Sheet (with blank апзмег зћеећ5) ....................... 65 
Bicabulary e. хәне. жентек. ы. әс:.. еесетет-..2.... тете 68 
СІРАМІВІЗНОГАМУМ» cov. | pr. Cee. one. ome ers oi. ae є 69 
SampleDatabase Reports x... w. г. онсе. тарна, о... 71 
Experiment Quiddity Definitions ................................ 75 
Master Quiddity Definitions ............. cee eee ec eee teens 81 
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EXPERIMENT #1 


A. PURPOSE 


The title of my proposed thesis is "The Problem of Unique Names 
Violations in Database Integration." The general area of research 
will be experimenting with a proposed method for automatically 
detecting possible naming problems of data elements prior to 
database integration. The purpose of this experiment is to gather 
data to assist me in analyzing this proposed method. 


B. BACKGROUND 


A database can be defined as "a store of integrated data 
capable of being directly addressed for multiple uses; ... ." The 
data in a database are stored in units called data elements. Each 
data element has a unique name associated with it. For example, 
the data element which contains an individual's social security 
number could be called "SSN." Data elements also have assigned 
data types (i.e., integer, character, etc.) and field lengths. 

As databases continue to grow and develop, the number of uses 
for the databases also increase. To support this growth, a need to 
integrate/combine databases has appeared. One aspect which must be 
dealt with before integration can occur is the problem of naming 
conflicts in like data elements. Specifically, the problem deals 
with two or more data elements having different names in each 
database but containing information about the same thing. For 
example, one database might call the data element which contains a 
social security number, "SSN," while another calls it "SSNO." 
Before these two databases can be merged, the naming conflict must 
be resolved. 

How do we find these conflicts? Clearly, we need more semantic 
information: information about what the data element represents. 
There are two basic methods currently used in identifying these 
conflicts. The first method is a syntactic check: they check the 
data element names syntactically or match data types (i.e. integer, 
character, etc.) or field lengths. The second method involves a 
screen of the data dictionary. The data dictionary has more 
descriptive information about the data elements but is written in 
natural language, which is not useful for machine inference. The 
proposed method contained in my thesis involves further defining 


‘Elias M Awad, Management Information Systems: 
Concepts, Structure, and Applications (Menlo Park, 


California: The Benjamin Cummings Publishing Company, Inc., 
1988), p: 593. 
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each data element by providing dimensional information and 
information about the nature or essence (quiddity) of the data 
contained in the data element. By comparing the dimensional 
information and quiddity of data elements in databases to be 
integrated, we hope to easily identify any naming conflicts which 
exist. The primary emphasis for the experiment concerns developing 
the "quiddity" of data elements. 


C. QUIDDITY 


"OQuiddity" is the name given to the description of what 
information is captured by the data element. For example, you 
might have a data element named "cost." You can probably surmise 
that the data element contains the cost of something, but what is 
that something? If we knew the quiddity of this data element, we 
would know what the something is. 


l. Components of Quiddity 


In order to use a computer program to compare the quiddity 

of data elements, we need to have a standard way of recording it 
without writing it in natural language form. For example, let's 
suppose that the data element "cost" captures information about the 
"retail cost of an IBM personal computer." 
We must dissect this definition into parts, almost like diagramming 
a sentence. When you diagram a sentence, you list the subject, 
adjectives, adverbs, and verb etc. When you determine quiddity, 
you must list the "stuff, stuff types, stuff attributes, stuff 
attribute types, and arity." 


a. Stuff 


"Stuff" answers the question "What is the data element 
about?," or put another way, it is the subject of the description. 
Stuff is usually indicated by a noun, describing individual things 
or collections of individual things, i.e., cars, trucks, ships, 
etc. In the above example, the stuff of the data element "cost" is 
"personal computer." 


b. Stuff Type 


"Stuff type" answers the question "What sort of or kind 
of stuff is it? Stuff types are usually indicated with an 
adjective but can also be indicated by a noun. Stuff types 
further describe stuff. For example, with both stuff and stuff 
type we can distinguish between a "truck tire" and a "tire truck." 
In the first case, what is the data element about? It is about a 
tire. What sort of tire? A truck tire. Thus the stuff is tire 
and the stuff type is truck. However, in the second case, the data 
element is about a truck. What sort of truck? A tire truck. Thus 
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the stuff is truck and the stuff type is tire. In our example 
above, the stuff type of "personal computer" (stuff) is "IBM." 


c. Stuff Attributes 


"Stuff attributes" answer the question "What is it 
about "the stuff" that you are interested in? Stuff attributes are 
usually indicated with nouns. What is it about a "personal 
computer" that we are interested in? The cost. So "cost" is the 
"stuff attribute" of "personal computer" (stuff). 


d. Stuff Attribute Types 


"Stuff attribute types" answer the question "What sort 
of stuff attribute is it?" Stuff attribute types usually qualify 
measurements and are typically indicated with nouns. From above, 
the stuff attribute was "cost." What sort of "cost" are we 
interested in? Retail cost. Thus "retail" is the stuff attribute 
type of the stuff attribute "cost." 


e. Arity 
When a term has "arity," it can be defined by one or 
more arguments. "Arity" is a term more commonly used in 
mathematics in conjunction with functions. For example, the 


function of "addition" has an arity of "2" because you must have 
two arguments in order to perform the function, in other words, to 
add. Division also has an arity of 2 whereas the square root 
function has an arity of 1 (you only need one argument to find the 
square root). With quiddity, we use arguments, when necessary, to 


further define "stuff." For example, some stuff expressions may 
have no arguments, i.e., truck, ship, computer, etc., and would 
have an arity of "0." We do not need any further definitions to 


know what a ship or a computer is. However, suppose "path" is the 
stuff expression. In this case, we would need to know the two end 
points of the path in order to define the exact path. Thus, "path" 
has an arity of "2" since it has two arguments (the two end 
points). In our example with the data element "cost," the stuff 
expression has an arity of "0." 


2. Notation 
Now that we have defined all the components of quiddity, we 


must have a way of recording the information. In general, quiddity 
notation will take the following form: 


Stuff Attribute Type(Stuff Attribute(Stuff Type(Stuff(Argument 1, Argument 2, ... Argument N)))) 
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There may be instances where there is more than one term for each 
category. When this happens, the terms should be listed 
alphabetically. 


a. Example 1 


Suppose you have a data element which captures 
information about the cost of a big red balloon. What is the data 
element about? A balloon (stuff). Does balloon need any arguments 
to define it? No, so "balloon" has an arity of "0" (no arity 
arguments). What sort of balloon (stuff) is it? It is big and 
red. We have two stuff types. What is it about the balloon that 
we are interested in? The cost (stuff attribute). What sort of 
cost is it? We don't know from the information given so we don't 
have a stuff attribute type. The quiddity for this example is: 


cost (big(red(balloon))) 
A A A A 


| | | L. —— STUFF 
STUFF ATTRIBUTE —— LLL STUFF TYPES 


b. Example 2 


Let's look again at the data element "cost" which 
Captures information about the "retail cost of an IBM personal 
computer." What is the data element about? A personal computer 
(stuff). Do we need any arguments to define personal computer? 
No. Thus there are no arity arguments (arity 0). What sort of 
personal computer (stuff) is it? It is an IBM (stuff type). What 
is it about the personal computer that we are interested in? The 
cost (stuff attribute). What sort of cost is it? Retail (stuff 
attribute type) cost. The quiddity is: 


retail(cost(IBM(personal computer) ) ) 
A 





A A A 
STUFF aa L STUFF 
ATTRIBUTE | L—— STUFF TYPE 
TYPE | 


L——— STUFF ATTRIBUTE 
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WORK SHEET! 


1. How do we capture the "meaning" of what a data element represents? 


a. A proposed method for capturing this “meaning” uses a type of formal 
"language" with various rules for forming the definition of the "meaning." This 
definition or description is called "quiddity." 


"From the Oxford English Dictionary, quiddity is 'The real nature or essence 
of a thing; that which makes a thing what it is.' Of course, ... [the 
proposed] language for expressing quiddities is only a model, or 
approximation, of genuine quiddity, if it exists." 


Example 1: 
€ DATABASE 1 € DATABASE 2 
- Variable: purchase cost - Variable: cost of purchase 
- Description: - Description: 
"Purchase cost of a truck" "Cost of purchase of a 


truck" 


b. Let's begin defining the basic component of quiddity. 


Example la: 

€ DATABASE 1 € DATABASE 2 
- Variable: purchase cost - Variable: cost of purchase 
- Description: - Description: 

"Purchase cost of a truck" "Cost of purchase of a 
truck" 
- Dimension: currency - Dimension: currency 

- Stuff: truck - Stuff: truck 


1411 examples and quotes in this work sheet have been borrowed from 
the following reference: Bhargava, Hemant K., Steven O. Kimbrough, and 
Ramayya Krishnan, Unique Names Violations: A Problem For Model 
Integration or You Say Tomato, I Say Tomahto (University of Pennsylvania, 
Department of Decision Sciences and Carnegie Mellon University, SUPA, 
Working Paper, 1990), pp. 5-8. 
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2. Now, let's change the variables slightly. 


a. Notice that the description changed but not the "dimension" or 
Шепті." 


Example 2: 


@ DATABASE 1 e DATABASE 2 


Variable: production cost 


Variable: purchase cost 


- Description: - Description: 

"Cost of purchasing a truck" "Cost of producing a truck" 
- Dimension: currency - Dimension: currency 
- Stuff: truck - Stuff: truck 


b. What is the quiddity? 


Sample line of reasoning used in Example 2a to describe "quiddity." 


"Both variables are about the same stuff: trucks. They differ 
in what it is they represent about trucks. What is it about 
trucks they describe? Purchasing in one case and production in 


the other. What is it about purchasing and production that 
they represent? Cost, in both cases. And what about cost? 
Nothing else." This line of reasoning suggests the quiddity 
descriptions in Example 2a. 





Example 2a: 


© DATABASE 1 
- Variable: purchase_cost 


Description: 
"Cost of purchasing a truck" 


Dimension: currency 


Quiddity: cost (purchase (truck) ) 
STUFF ATTRIBUTES — STUFF 


Quiddity Paraphrase: 
"the cost of purchase of a truck" 


€ DATABASE 2 
- Variable: production cost 


- Description: 
"Cost of producing a truck" 


- Dimension: currency | 
cost(production(truck) ) 


STUFF ATTRIBUTES а) STUFF 


- Quiddity Paraphrase: 
“the cost of production of a truck” 
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3. Below is a list of several data elements in a “Home Inventory" database. 
See if you can define the quiddity for each data element. 


DATA DICTIONARY EXCERPT: 
FIELD NAME ТҮРЕ DESCRIPTION 


ITEM l(character)40 Identifies a specific piece of 
property, i.e., sofa, dining room 
chair, TV, etc. 


QUANTITY l{integer}3 Identifies the total number of like 
items or pieces of property owned, 
i.e., "2" if two sofas are owned. 


VALUE l{integer}8 Identifies the current replacement 
cost of a specific piece of 
property. 

DATE l{date}8 The month, day, and year the 


property was purchased or acquired. 


PRICE l{integer}8 Identifies the amount paid for a 
specific piece of property. 


WEIGHT l{integer}5 The total number of pounds a 
specific piece of property weighs. 


FREE WEIGHT 1(1одіса1)1 Whether weight of a specific piece 
of property applies toward the 
professional weight allowance or 
not, i.e., "Y" if yes or "N" if no. 





QUIDDITY 


DATA 
ARITY STUFF STUFF ATTRIBUTE 
(ARGUMENTS ) TYPE ATTRIBUTE TYPE 
ILLE = 
= | — T T — 
== = 
= 
ННІ мак 
—— ҮШ ШЫ 
== ү т 





























EXPERIMENT #1 
Instructions: 


1. Determine the quiddity for each data element listed. Record the components 
of the quiddity in the appropriate columns of the row listing the data element. 
Please write legibly. 


2. Please keep track of the order in which you determine the quiddity components 
for each data element by placing a number in the upper left corner of the 
appropriate "box" in the table. For example, if the first term you define for 
the first data element is its stuff, the second term is its stuff type, and the 
third term is its stuff attribute, the table would look like this: 


DATA STUFF STUFF 
ELEMENT STIIEE ARITY TYEE ATTRIBUTE 
1 3 


PC IBM 












STUFF 
ATTRIBUTE 
ТҮРЕ 










Computer 


3. Each quiddity may or may not have a term for each component. (HINT: You 
will always have at least a "stuff" component and a "stuff attribute" component.) 
Some quiddities may have more than one term for a component. If there is more 
than one term, write both terms in the "box" and place its ordering number to the 
left of each term. 


4. I am interested in the "method" you use in determining the quiddity, 
particularly in the "thought process" you go through in working through this 
experiment. Any comments or suggestions you have (even in bullet form) is 
appreciated. 


COMMENTS: 














VIRUS DATABASE 


DATA QUIDDITY 
ELEMENTS 
ARITY STUFF STUFF 
(ARGUMENTS) TYPE ATTRIBUTE 
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HARDWARE AND SOFTWARE TRACKING SYSTEM DATABASE 


QUIDDITY 


ELEMENTS STUFF 
ARITY STUFF ATTRIBUTE 
(ARGUMENTS ) ATTRIBUTE TYPE 
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addition 

alias 

brand 

building 

bytes 
commercial 
compatibility 
compatible 
component 
computer 
damage 

destroy 
disinfectant 
disk 

disk boot sector 
executable files 
file 

general 
hardware 

IBM 
identification 
indicator 
information 
internal 


LAN 


VOCABULARY 
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literature 
location 
machine 
manufacturer 
model 

name 
network 
number 
office 
operating system 
piece 
Publisher 
receipt 
reference 
sector 
serial 
site 

size 
software 
source 
supplier 
system 
vender 
vendor 


virus 


FIELD NAME 


ALIAS 


BOOT SECTOR 


COMMAND COM 


DISINFECTANT 


EXE FILES 


MACHINE TYPE 


OPERATING SYSTEM 


REFERENCE 


SIZE 


VENDOR 


VIRUS 


IBM 


VIRUS DATABASE DATA DICTIONARY 


TYPE 


l(character)20! 


1(character)1' 


1{character}1° 


1{character}10° 


1(character)1' 
1{character}10° 
1{character}10° 
1{character}80° 
1{integer}5* 
l(character)80? 
1{character}20° 


1{сһагас+ег}1? 


1ми11з а11омеа 


Puyn or “n" only, no nulls 


3No nulls 


“small integer, 0-32767 


"Unique key, no nulls 


DESCRIPTION 


Commonly used alias 


Whether or not the virus corrupts the disk boot 
sector 


Whether or not the virus infects the system 


Name of a commercially available virus 
disinfection routine which is known to 
successfully remove this virus 


Whether or not the virus infects EXE files 


Name of a commercial computer type 


Name of the operating system used 


Significant literature reference for virus 


Size of virus in number of bytes 


Commercial source of disinfectant product 


Name of each virus which infects a computer 


Whether or not the computer system is IBM or 
IBM compatible 
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FIELD NAME 


COMPATIBLE 


DESCRIPT 


HLAN 


MODEL 


NAME 


OFFICE 


PUBLISHER 


REMARKS 


SITE 


SSERIAL 


VENDER 


HARDWARE AND SOFTWARE TRACKING SYSTEM DATABASE 


ТҮРЕ 


l(character)1 


l(character)30 


1(logical)l 


l(character)30 


l{character}15 


l(character)30 


l{character}4 


l(character)30 


l(character)80 


l(logical)l 


l{character}25 


l(character)30 


DATA DICTIONARY 


DESCRIPTION 


Identifies a piece of software as being 
compatible with IBM or Apple 


Identification type of a piece of hardware, 
i.e., keyboard, monitor, etc. 


Identifies a piece of hardware as local area 
network compatible (True) or not (False) 


Identifies the brand of a piece of hardware 


Identifies the model number/type of a piece of 
hardware, i.e., "VGA" for a monitor or "286" 
for Zenith PC, etc. 


Identifies the name of a piece of software 
Identification number of an office that is 
inside a building 

Identifies the name of a software publisher 


General remarks about a piece of hardware 


Identifies a piece of software as having a site 
license (True) or not (False) 


Identifies the serial number of a piece of 
software 


Identifies the name of a hardware vender 
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Virus Name 


ХАТ 

1392 

1210 

1720 
Saturday 14th 
Korea 

Vcomm 
ItaVir 
Solano 
У2000 

1554 

512 

EDV 

Joker 
Icelandic-3 
Virus-101 
1260 
Perfume 
Taiwan 
Chaos 
Virus-90 
Oropax 

4096 
Devil's Dance 
Amstrad 
Payday 


Datacrime II-B 


Sylvia 
Do-nothing 
Sunday 
Lisbon 
Typo 


Key - 


INFECTION AREAS 


Virus Database Listing 
O4 April 


Updated 


Disinfector 


cleanup 
cleanup 
cleanup 
cleanup 
cleanup 
m-disk 
cleanup 
cleanup 
cleanup 
cleanup 
scan 
scan 
m-disk 
cleanup 
cleanup 
cleanup 
cleanup 
cleanup 
cleanup 
m-disk 
cleanup 
cleanup 
cleanup 
cleanup 
cleanup 
cleanup 
cleanup 
cleanup 
cleanup 
cleanup 
cleanup 
cleanup 


Р - disk partition table 


H - fixed disk boot sector 
O - .OVR files 
C - .COM files 

FEATURES 
M - remains memory resident 
Bytes - virus size 


DAMAGE CAUSED 
B - corrupts disk boot sector 
P - corrupts 


Е - 


· СОМ, 


ПЕЛЕ. 


333525395 €» 25 3 209 532955 Sx 2532559 JBS 535515 2 5 yY H 
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21523 5290224 340599 32283413 oS 553-53 29.3 53 € 3.9. 3.2*5923. € 3. 9 0953.3 T = 


.OVR files 
formats part or all of disk 


(D 


Sj See Se wee jS. 3.3 35€ 3 3 3 «€ УЉЕ 39.325 3 5172 48 3:99 52 nuQ 


ct 


аҙ € 09M и > << 5: 3- 3053 3 25$ 2 302.3 5'« 3/039 2 3 « 255 О = 


нч гп тп 


г оо 


О 
ЗТ З 


= => < 3 =m < Ss 3o « 823 => DIJI Xx JJ ZI x X v< Z= < x= >< 


ч чч t с ч, Ro d сч чылы уч ш ес ee ee «В Фр. 


“ч 


артар арс) сш ағас сусу су => =>.) > OO << 3.3.3 3053.5 D> 3 = Ф 


T290 


Ü 


Dene. see > = => нео > 585353 > => 5 5 5 5 232 253 2 23 '« 


p 


« JMR > > 5 >= 3 w w < узе WE WES USC - E о 


KC << x< x< МЕ че MC 2 < WEM 522575 << 90 


853 
2560 
1260 

769 

708 


857 
2773 
4096 

941 

847 
1808 
1917 
1332 

608 
1636 

648 

867 


у ој > ~ о => > аразы с Seen) Se DJ J DII xX 3 «23 3.232 3 3 (D Q 


floppy disk boot sector 
.EXE files 
COMMAND. СОМ 


self encrypting 


degrades system operation 
corrupts data files 
corrupts file linkage 
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Ф 


Z3 3 3 203899 2185 w 9095 3 3 35 392 323 2122 2552252555250 


О) 


= 5D DS She 2151: 3 53 2s m 3'3IJ 3 3 955 з 5 5 5 5 З З ж о « BIJT ПІ 


= 


e J S SSD у= = саты 2 051055555 NA NK Se x= < r 0 
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5/13/90 
Report of Software by Name and Version 


Serial Procurement Date Lan Hardware 

Jersion Publisher License Number Number Received Compatible Compatible 
t Name AMI PRO 

2.00 SAMNA 76650755 898023433 08/20/89 No IBM 
t Nase C COMPILER 

5.00 MICRO SOFT 4990-34 879239342 8980341 03/05/89 Мо IBM 
* Nase DBASE III 

1.10 ASHTON TATE 23940044-4 99540 90RQ123K 02/05/90 No IBM 
t Nase DBASE IV 

1.00 ASHTON TATE 9823-332-112 1001-02 89801234 01/03/90 Yes 18M 
t Name DESKTOP PUBLISHER 

1.00 DIGITAL RESEARCH 9837548 185494 90800330 04/01/90 Но IBM 
s Nane GEM DRAW 

2.00 DIGITAL RESEARCH 77-343 987244-211 9080234 01/02/90 Ко IBM 
k Name HARVARD GRAPHICS 
| 2.00 ALUS 398-24 87R0334 03/05/87 No IBM 

Nase LOTUS 123 

2.00 LOTUS DEVELOPMENT CORP 7358-67-8863 4568-23 87RQ123E 03/10/87 No IBM 

Name PFS: PROFESSIONAL WRITE 

3.00 PFS 83896 230096 89801238 07/02/89 Но IBM 

Nage PRESENTATION 

1.20 ALDUS 877-23 89R0433R 04/19/89 Мо IBM 

Name RENEX TMS 

2 7% RENEX 221922840 98-12339 87808732 04/02/87 No IBM 

№. ZRIGHT WRITER 

1.20 PFS 345-A349 888034К0 02/13/88 Ко IBM 

Name TIME-LINE 

4.00 SYMANTIC 13003-234-2333 2340-123-11111 90R012E2 01/02/90 Үеѕ [8M 
F Name WINDOWS 
' 2.00 MICROSOFT 2134321809 77648766 9080023 07/10/89 No IBM 

) 
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06/13/90 
Report of Software that is LAN Compatible 


Software 
Name Publisher 


** Hardware Type IBM 
DBASE IV ASHTON TATE 
IB ME- -I- INE SYMANTIC 
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Version 


Pe 


. OO 


06/13/90 


Serial 


Ла 
umber Number 


Make 


1$ Procurement Number 87RQE1203 


xs Vendor ZENITH 


00100 932WFO381TS2 


ZENITH 


1t Procurement Number 89803432 


tt Vendor COMPUADD 


38 


1051867 


VENTEL 


tx Procurement Number 89RQ345K 


tt Vendor HEWLET PACKARD 


35 


284181979 


HEWLET PACKARD 


tt Procurement Number 89R098D 


11 Vendor HEWLET PACKARD 


36 


61577553 


HEWLET PACKARD 


1t Procurement Number 89R0E1234 


9 Vendor APPLE 


4 


Ғ851ЕЕХМ5825 


MACINTOSH 


32 Procurement Nuaber &9RQE234 


tz Vendor APPLE 


2 


669944 


MACINTOSH 


tt Procurement Number 90R0E3401 


tz Vendor ZENITH 


2 


933NE0306T00 


ZENITH 


Report of Hardware Procurement by Procurement Number 


Model 


286 


24008 


АТ СОМР 


VGA 


ПЕ 


ү 


286 


Description 


KEYBOARD 


MODEM 


CPU 


MONITOR 


CPU 


KEYBOARD 


MONITOR 
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Мо 


No 


No 


No 


No 


No 


Procurement Date 
Internal Date 


03/12/87 


05/06/89 


05/09/89 


04/18/89 


02/12/89 


02/15/89 


01/12/90 


Received 


03/12/87 


05/06/89 


05/09/89 


04/18/89 


02/12/89 


02/15/89 


01/12/90 


VIRUS DATABASE 
Subject A 


DATA QUIDDITY 
ELEMENTS 
ARITY STUFF STUFF 
(ARGUMENTS) TYPE ATTRIBUTE 
n RES је 
disk-boot- 
BOOT SECTOR sector indicator damage 
disenfectant 
software commercial 
executable 
files indicator damage 
pg ти a Ee 
T 2-21 ешігті 
commercial 
vendor disinfectant 
Te к | 
compatibility 
system computer indicator IBM 


15 























VIRUS DATABASE 








Subject B 
DATA QUIDDITY 
ELEMENTS STUFF 
ARITY STUFF ATTRIBUTE 
ARGUMENTS ATTRIBUTE TYPE 





virus 


| disk boot 
information damage sector 





BOOT SECTOR virus 





COMMAND COM 





virus 


disinfectant destroy 


executable 
information damage files 





EXE FILES virus 





operating 
OPERATING SYSTEM computer information system 


virus information reference 








virus information size 











| o virus information | disinfectant Source 
VIRUS virus bee information 
computer information operating compatible 
system IBM 
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VIRUS DATABASE 


Subject C 
DATA QUIDDITY 
ELEMENTS STUFF 
ARITY STUFF STUFF ATTRIBUTE 
(ARGUMENTS ) TYPE ATTRIBUTE TYPE 





alias 


disk ` 
BOOT SECTOR category damage indicator boot sector 
COMMAND COM NE damage indicator command com 


OPERATING SYSTEM 





executable 





brand 






operating 


system 


reference == рн information] identification 


virus computer size bytes 


IBM 
category computer 


77 










S 


HARDWARE AND SOFTWARE TRACKING SYSTEM DATABASE 











Subject 1 
DATA 
ELEMENTS STUFF 
ARITY STUFF STUFF ATTRIBUTE 
(ARGUMENT TYPE ATTRIBUTE TYPE 
5) 
COMPATIBLE software piece compatible IBM 
(Apple) 
hardware component identifi- 
cation 
hardware үч component compatible lan 
hardware "T component brand 
MODEL hardware ғұ component model 
software nem piece 


office 


building 
hardware БЕ component 


PUBLISHER publisher software 


remarks general 


company 


software | location piece license site 


component company 
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serial 


vendor hardware 


Ue Ц 
"n 
npud 


HARDWARE AND SOFTWARE TRACKING SYSTEM DATABASE 
Subject 2 


DATA QUIDDITY 


ELEMENTS 
ARITY STUFF STUFF 
ARGUMENTS TYPE ATTRIBUTE 





vendor 


COMPATIBLE software piece compatibility 


ЕХЕ еВ 

lan 

compati- hardware indicator 
bility 


hardware 








hardware 


vendor 
component brand 
pm сни 


identification 
building number 


MODEL hardware 


NAME software 


(D 
H 





[z— 
site 
license software indicator 
serial 
software piece number 
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HARDWARE AND SOFTWARE TRACKING SYSTEM DATABASE 
Subject 3 


DATA QUIDDITY 





ELEMENTS 
ARITY STUFF 

(ARGUMEN ATTRIBUTE 

TS) 





COMPATIBLE indicator 


compatibility 


category 


infomation 


general 


РЄ 


hardware 


category compatibility indicator 


hardware brand 


number 
model 


identifica- 
tion 


MODEL hardware 





software 


site 
category license indicator 
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B tj 
- Ë 
H ү 


PUBLISHER 


serial 


| H: 
УДА АД 


VIRUS DATABASE 
MASTER 





при. 


....4-: 


TF eb ate 


software 


virus 
COMMAND COM damage system indicator 





< 
lE: 
H 
e 
(n 
б 
F 
= 
0 
tn 








vlrus 


MACHINE TYPE hardware 


virus 
executable files 





operating_ 
system 


OPERATING SYSTEM 


software 





literature 


reference virus 


virus 


software 


vendor disinfectant 





software 
compati- brand hardware indicator 
bility 
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HARDWARE AND SOFTWARE TRACKING SYSTEM DATABASE 
MASTER 





sss s ss 


еее | еее 
ғ 


wenn ow e EC ОТО ТР к.н 


indicator 


жж 






COMPATIBLE 





compatibility 





hardware 
software 


| 
| 





internal 


PUBLISHER publisher software мі 


депега1 


OFFICE 


| = | 





hardware information 


site 
license software indicator 


жж 


** The words "model", "serial", and *number" were provided separately in the vocabulary list of experiment #1. | 
Based on the comments received after experiment $1, the words should have been provided as a "word group" as shown in 
the table above. This combination better describes the stuff attribute. | 








APPENDIX B -- PROTOTYPE IMPLEMENTATION 
This Appendix contains a copy of the Prolog program listing (the prototype, a list of the data 
used by the prototype (i.e., experiment and master data). Additionally, a sample report/list 


produced by the program is included. Items specified above are found on the following pages: 


ПА ROSES PISCE tr EIT ___.......................... 84 
НБП Рио а ква м ка ною я з ож Y 90 
Кыныш еи аа EPUM К... .......................... 91 
В ...cu rr IR RII 94 





| 


Page: 1 tomato.pro 


/* To run the system, here's what happens: 

1. Across DB Test: 

Perform step 2 for every pair of subjects 
(51,52) where Sl wrote quiddities for рві 
and S2 wrote quiddities for 082. 

2. For a given pair Of subjects (51,52): 
perform the quiddity test with each pair 
of data elements (E1,E2) where El is in 
DB1 and E2 is in DB2. 

5: 


/* Problems with current implementation (3-5-91): 


А 
/* Results: 
1. Identical element pairs. (Names and quiddities equal.) 
2. Synonym pairs. 
3. Homonym pairs. 
4. No action pairs. (Quiddities and names unequal.) 
52 
/* Example: 
acrossDBtest (ProcNo, [ (naf, [a,b,c]), (craft,[d,e,f])]). */ 
go :- 
write('Procedure No? '),read(ProcNo), 
nis 
write('First Database Name? '),read(DB1), 
nis 
write('First Subject Name? '),read(Subjectl), 
nie 
write('Second Database Name? '),read(DB2), 
nig 


write ('Second Subject Name? '),read(Subject2), 
acrossDBtest2 (ProcNo, (DB1,Subject1), (DB2,Subject2)). 


acrossDBtest2 (ProcNo, (DB1,Subject1), (DB2,Subject2)) :- 


quiddity (DB2,Subject2,Element21, , , , , ), 
quiddity eq(ProcNo, [DB1,Subjectl,Element11], 
[DB2, Subject2,Element21],Result), 


fail. 


acrossDBtest2 (ProcNo, (DB1,Subject1), (DB2,Subject2)) :- 
printreport (ProcNo, (DB1,Subject1), (DB2,Subject2)), 
retractall(tomato( , , ,—)). 


/* For assertion of results of quiddity test. */ 


determine (ProcNo, [DB1,Sub1,E1], [DB2,Sub2,E2],Answer,Assertion) :- 
(El = E2, 

Answer = yes, 

asserta (tomato (ProcNo, [DB1,Sub1,E1], [DB2, Sub2,E2] , match) ) ; 


El = E2, 
Answer = no, 
asserta (tomato (ProcNo, [DB1,Sub1,E1], [DB2, Sub2, E2] , homonym) ) ; 
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not (El = E2), 
Answer = yes, 
asserta (tomato (ProcNo, [DB1, Subl,E1], [DB2, Sub2, E2] , synonym) ) ; 


not (E1 = E2), 
Answer = no, 
asserta (tomato (ProcNo, [DB1,Sub1l,E1],[DB2,Sub2,E2],relax))). 


ж 
RULES FOR EQUALITY: 


274 
/* QUIDDITY EQUIVALENCE: equivalence of quiddities of two data elements. х/ 


/* quiddity eq(Elementl,Element2,Answer). 
Answer will be yes, or no. x 


Eumiderty eq(ProcNo,E1,E2,Answer) :- 

x ЖБ ЕПІТ set eq(ProcNo,El,E2,yes), 

stuff attribute set eq(ProcNo,El,E2,yes),Answer - yes; 
Answer - no), 

determine (ProcNo, El,E2,Answer,Assertion),!. 


/* STUFF-SET EQUIVALENCE: equivalence of stuff-sets. */ 
ЖС ШЕГЕ set eq(ProcNo,El,E2,Answer). */ 
/* stuff-set (A) =~ stuff-set (B) 

if stuff_set_eq(ProcNo,A,B,yes) . */ 


/* Set 2/Rule 1 -- The stuff-sets are equal if the stuff terms 
are equal, the arity terms are equal, and the 
stuff type terms are equal. 
E/ 


ceuet set eq((TE,1,SAE),A, B, yes) :- 
ProcNo = (TE,1,SAE), 

Seuct (A,SA),stuff(B,SB), 

arity (A,ArA),arity(B,ArB), 

stuff type (A,StA), stuff type(B,StB), 


term_eq(ProcNo, SA, SB, yes), 
term eq(ProcNo,ArA,ArB,yes), 
term eq(ProcNo,StA,StB,yes). 


/* Set 2/Rule 2 -- The stuff-sets are equal if the stuff terms 
are equal, and the arity terms of one is 
contained in the arity term of the other, 
and if the stuff type term of one is contained 
in the stuff type term of the other. 
57 


Seven set сас (ШЕС2/ББЕ),А, Ве усе)ме- 
ProcNo = (ТЕ,2,5АЕ), 
stuff (A, SA), stuff(B,SB), 

arity (A,ArA),arity (B,ArB), 

stuff type (A,StA),stuff_type(B,StB), 


term_eq(ProcNo, SA, SB, yes), 
contained in check (ProcNo,ArA,ArB, yes), 
contained in check (ProcNo, StA,StB, yes). 


/* Set 2/Rule 3 -- The stuff-sets are equal if the stuff terms 
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/* 


Set 
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are equal, and the arity + stufftype terms of one are 
contained in the arity +stufftype term of the other. 
E 


stuff set eq((TE,3,SAE),A, B, yes) :- 
ProcNo - (TE,3,SAE), 
stuff (A, SA), stuff (B, SB), 

arity (A,ArA),arity (B,ArB), 

stuff type(A,StA),stuff type(B,StB), 


term eq(ProcNo,SA,SB,yes), 
append (ArA, StA, TotalA), 
append (ArB, StB, TotalB), 
contained in check (ProcNo, TotalA, TotalB, yes) . 


2/Rule 4 -- The stuff-sets are equal if 

the stuff + arity + stufftype terms of one are 
contained in the stuff + arity +stufftype term of the other: 
x 


stuff set eq((TE,4,SAE),A, B, yes) :- 
ProcNo = (ТЕ,4,5АЕ), 
stuff (A, SA), stuff (B,SB), 

arity (A,ArA),arity(B,ArB), 

Stuff type (A, StA), stuff type (B; StB); 


append (ArA, [SA],SubTotalA), 
append (ArB, [SB], SubTotalB), 


append (SubTotalA,StA,TotalA), 
append (SubTotalB,StB,TotalB), 
contained in check (ProcNo, TotalA, TotalB,yes). 


/* STUFF-attribute-SET EQUIVALENCE: equivalence of stuff-attribute-sets. */ 
/* stuff attribute set eq(ProcNo,E1,E2,Answer). */ 
ТББ  attribute-set(A) =~ stuff attribute-set (B) 

if stuff attribute set eq(ProcNo,A,B,yes) . */ 


/* 


/* 


SEE 


Set 


3/Rule 1 -- The stuff attribute-sets are equal if Се сонет асты ыы 
are equal, and the 

stuff attribute type сеппе аге equa 

gi 


stuff attribute set eq((TE,SE,1),A, B, yes) :- 
ProcNo - (TE, УБА ЈУ 
stuff _ attribute (A, SaA),stuff attribute (B, SaB), 
stuff attribute type(A,SatA),stuff attribute type(B,SatB), 


term eq(ProcNo,SaA,SaB,yes), 
term_eq(ProcNo, SatA, SatB, yes). 


3/Rule 2 -- The stuff attribute-sets are equal if 
ено вета attribute terms are equal, 
and if the stuff_attribute type term of one is contained 
in the stuff attribute type term оке сеш 
A 


stuff attribute set eq{(TE,SE, 2), A, eee veo. 
ProcNo = (TE,SE,2), 
stuff attribute (A,SaA),stuff attribute (B; Sab); 
stuff attribute _type(A,SatA),stuff attribute type(s, ID 


term_eq(ProcNo, SaA, SaB, yes), 
contained in check (ProcNo,SatA,SatB, yes). 
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/* Set 3/Rule 3 -- The stuff attribute-sets are equal if 
the stuff attribute + stuff attributetype terms of one are 
contained in the stuff attribute +stuff_attributetype term of the other. 
1 


uri attribute set eq((TE,SE,3),A, B, yes) :- 
ProcNo = (TE,SE,3), 
stuff attribute (A,SaA), stuff_attribute (B,SaB), 
stuff attribute type(A,SatA),stuff attribute type (B,SatB), 


append([SaA],SatA, TotalA), 
append([SaB],SatB, TotalB), 
contained in check (ProcNo, TotalA, TotalB,yes). 


/* TERM EQUIVALENCE: Term Equivalence Rules: 


format: term eq(WhichRule, Terml, Term2). 
succeeds when Terml and Term 2 are equivalent 
under WhichRule. */ 


/* To take care of the case when Terml and Term2 are lists. */ 
/* In such cases, see if all elements of Terml are 
"contained in" Term2, and vice versa. 

The predicate term list LtoR eq takes care of the above. */ 


p xedg(ProcNo,[Hl|Tl1], IH2|T2]1,yes) :- 
term 1152 ІСОК еа(РтосКо, ІНІ |71), ІН2|72),уүев), 
term list LtoR eq(ProcNo, [B2|T2], [B1|T1],yes). 


тен 50 LtoR eq(ProcNo, [],List2, yes). 
term_list_LtoR_eq(ProcNo, [Firstl|Restl],List2,yes) :- 
Semeained in(ProcNo, [Firstl1],ListZ), 
term eq{ProcNo, Restl,List2, yes) . 


/* Set 1/Rule 1 -- Terms are equal if they match syntactically */ 


БӘТІП есі(1, , ), А; В;уеѕ) = 
А = В. 


/* Set 1/Rule 2 Terms are equal if Rule 1 is true 
or if A and B are synonyms */ 


БЕРІП eq((2,X,Y), A, B,yes) :- 
term eq((1,X,Y), A, B,yes):; 
synonym (A, B). 


/* Set 1/Rule 3 -- Terms are equal if Rule 1 ог 
Rule 2 are true, or if A and B are related, 
i.e., they are contained in the same inheritance hierarchy. */ 


Foermreq((3,X,Y), A, B,yes) :- 
permecq((1,%,1), A, B, Yes); 
Белі ес((2,Х,Ү), A, В,уе5); 
is a(A,B); 
їс а(вд). 


/* If none of these work, then (A,B) are not equivalent. */ 
term eq( ,A,B,no(A,B)) :- 
1, fall. 
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/* UTILITIES */ 


/* contained in check(Setl,Set2,SuperSet). 
succeeds when Setl is contained in Set2, 
SuperSet indicates which one is the larger. */ 


contained in check (ProcNo,Setl,Set2,yes) :- 
contained in(ProcNo,Setl,Set2). 

contained in check (ProcNo,Setl,Set2,yes) :- 
contained in(ProcNo,Set2,Setl). 


/* contained in(Set1,Set2). 
succeeds when Setl is contained in Set2. */ 


/* empty list is contained in AnySet. */ 
contained in(ProcNo, [],AnySet). 


/* A set containing only 1 Member is contained in Set2 
if Member is a member of Set2. */ 
contained in(ProcNo, [Member], [First |Rest]) :- 

term eq(ProcNo,Member,First,yes); 

not (Rest = []), 

contained in(ProcNo, [Member],Rest). 


/* A set containing a First member and the Rest of the set, 
is contained in Set2 if Set2 contains the First member 
as well as the Rest of the set. */ 
contained in(ProcNo,[First|Rest],Set2) :- 
not (Rest - []), 
contained in(ProcNo, [First],Set2), 
contained in(ProcNo,Rest,Set2). 


printreport (ProcNo, (DB1,Sub1), (DB2,Sub2)) :- 

write('Please enter the name of the output file: '), 
read(FileName), 

tell(FileName), 
write ('SEEEEEEEEEEEEEESEEESEEEESEEEESEEEESS' ) nl, 
printlist(['Results for procedure: ',ProcNo,' applied to ', 

[DBl,Subl],' and WP IDE2, Sub2Ip ie 
D 
write('List of matches: '), 
setof0 ( (E1,E2) , tomato (ProcNo, [DB1,Sub1,E1], [DB2, Sub2, E2] , match) , MatchList), 
printlist (MatchList), 
ninl, 
write('List of homonyms: '), 
setof0 ((E1l1,E2),tomato (ProcNo, [DB1, Sub1, El], [DB2, Sub2, E2] , homonym) , HomList), 
printlist (HomList), 
nl,nl, 
write('List of synonyms: '), 
setof0((El,E2),tomato (ProcNo, [DB1,Sub1,E1], [DB2, Sub2, E2] , synonym) , SynList), 
printlist (SynList), 
/* 
ning 
write('List of garbage: '), 
setof0 ((E1,E2) , tomato (ProcNo, {[DBl,Subl1,E1], {[DB2, Sub2,E2], relax), Garbage), 
printlist (Garbage), 
*/ 

told. 


printlis (1) 
printlist (THi|Tj) R: 
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nil, 
write (Н), 
prantiist (T). 


meporo(x,Y,Z) :- setof(X,Y,2),!. 
БЕРСЕ, ,І)) :- !. 


/* Assume that synonyms are declared using the predicate synonyms/2. 
For example, synonyms(cost,price).  */ 


synonym(A,B) :- 
synonyms (A,B). 

synonym(A,B) :- 
synonyms (B,A). 

/* synonym(A,B) :- 
synonyms (A,C), 
synonym(C,B). 

synonym (A,B) :- 
synonyms (C, A), 
synonym (C,B). a 


/* Multi-level classification hierarchies */ 
/* Assume that if A is a B, there is a predicate 15А(А,В) %/ 


BSSEGUA.B) :- 
isA(A,B). 

is a(A,B) :- 
isA(A,C), 
поем (С,В) . 


/* Retrieval of quiddity terms. */ 


stuff ([DB,Subject,Element],Stuff) :- 
E" (DB; Subject; Element,Stuff, ,” 7, ). 

arity([DB, Subject,Element],Arity) :- 
euredity(DB,Subject,Element, ,Arity, , , ). 

нешеє bype([DB,Subject,Element],StuffType) :- 
quiddity (DB,Subject,Element, , ,StuffType, , ). 

stuff attribute([DB,Subject,Element],StuffAttribute) :- 
EU (DB,Subject,Element, , , ,StuffAttribute, ). 

Stuff attribute type([DB,Subject,Element],StuffAttributeType) :- 
may (DBE, Subject, ELement, , , , ,StuffAttributeType). 


—«— “Ж SEO SE PE DE AE A A — A - 2. ғ 


Results for procedure: 222 
Applied to NAC Subject A and Craft Subject 2 


(*First data element listed is from NAC and second is from CRAFT) 


List of matches: 


NONE 


List of homonyms: 


firstname, firstname 
lastname, lastname 
section, section 


List of synonyms: 


сене штите 
CESQSCDSODUIUDERT 
dpt,dept 

emph area,emph 

emph area,emph name 
length,coreqtr 
ргеацшстс<усуспате 
pred ©Сг<,стг<пишБет 
ркеа евро чипа 
ркеанаре „аере 
preqgdpt,pregmrecdudebt 
геа СЕБ; СЕБЕ пете 
req crs; crs number 
section,coreqtr 
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Primary Experiment Data, Synonym List (Vocabulary), 
and Classification Network 


quiddity (nac,nr,crs,course,[],[nps],designator,[]). 

quiddity(nac,nr,curr ofcr,manager,[curriculum],[military officer],surname,[]). 
quiddity(nac,nr,curric_nam,curriculum,|[],[nps],title,[]). 
quiddity(nac,nr,degree_tit,degree,|[],[nps],title,[]). 
quiddity(nac,nr,dpt,department,[],[nps],identifier,[]). 
quiddity(nac,nr, firstname, person, |],[],name,[given]). 
quiddity(nac,nr,lastname,person,|[],[nps],surname,|[]). 
quiddity(nac,nr,section,time_period,[],{course],designator,[]). 
quiddity(nac,nr,hours,course,[],[],credit,[]). 

quiddity(nac,nr,prof phone,professor,[],[nps],telephone number,office). 
quiddity(nac,nr,emph area,curriculum,[],[emphasis area],credits,[required]). 
quiddity (nac,nr,length,curriculum,[],[],term,[required]). 
quiddity(nac,nr,preq_crs,course,|[],[prerequisite,required],designator,|[]). 
quiddity(nac,nr,preq dpt,department,[],[prerequisite,required],identifier,[]). 
quiddity(nac,nr,req_crs,emphasis_area,[],[course,required],designator,|[]). 
quiddity(nac,rg,crs,course,[],[nps],identifier,[]). 
quiddity(nac,rg,curr_ofcr,person,[],[military_officer],name,[surname)). 
quiddity(nac,rg,curric_nam,curriculum,[curriculum],{],name,[]). 
quiddity(nac,rg,degree tit,degree,[],[nps],title,[]). 
quiddity(nac,rg,dpt,department,[],[nps],name,[]). 
quiddity(nac,rg,firstname,person,[],[],name,[given]). 

quiddity (nac,rg,lastname,person,[],[],name,[surname]). 
quiddity(nac,rg,sectiontime period,[],[course],identifier,[]). 

quiddity (nac,rg,hours,course,[],[],credits,[]). 

quiddity(nac,rg,prof phone,professor,[],[],telephone number,[office]). 
quiddity(nac,rg,emph_area,emphasis area,[curriculum], [curriculum],name,[]). 
quiddity(nac,rg,length,time_period,[],[curriculum],term,[]). 
quiddity(nac,rg,preq crs,course,[],[prerequisite],designator,[]). 
quiddity(nac,rg,preq dpt,department,[],[course],designator,[]). 
quiddity(nac,rg,oreq crs,course,[],emphasis area],designator,[required]). 
quiddity(nac,mg,crs,course,[],[nps],designator,[]). 

quiddity(nac,mg,curr ofcr,manager,[curriculum],[military officer],surname,[]). 
quiddity(nac,mg,curric nam,curriculum,[],[nps],title,[]). 
quiddity(nac,mg,degree tit,degree,[],[],title,[]). 

quiddity (nac, mg,dpt,department,[],[nps],identifier,[]). 
quiddity(nac,mg, firstname, person,[],[nps],name,[]). 
quiddity(nac,mg,lastname,person,[],[nps],surname,[]). 
quiddity(nac,mg,section,time_period,[],[class],identifier, []). 
quiddity(nac,mg,hours,course,[],[],credits,[]). 

quiddity(nac,mg,prof phone,professor,[],[nps],telephone number,[]). 
quiddity(nac,mg,emph area,emphasis area,[],[],title,[]). 
quiddity(nac,mg,length,time period,[],[curriculum],designator,[]). 
quiddity(nac,mg,preq crs,course,[],[prerequisite,nps],designator,[]). 
quiddity(nac,mg,preq dpt,department,[],[prerequisite,nps],identifier,[]). 
quiddity(nac,mg,req crs,course,[],[required,nps],designator,[]). 
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quiddity (craft jc,ssn,student,[],[],identifier,[social security]). 
quiddity(craft,jc,lastname,student,[],[],surname,[]). 

quiddity (craft,jc,firstname,student,|[],[],name,[given]). 
quiddity(craft,jc,section,section,[],{[nps],designator,|]). 
quiddity(craft,jc,dept,department,|[],[nps],identifier,|[]). 
quiddity(craft,jc,crs_number,course,[],[nps],identifier,[]). 
quiddity(craft,jc,crs_name,course,|[],[nps],name,|[]). 

quiddity(craft jc,emph,emphasis area,[],[nps],identifier,[]). 
quiddity(craft;jc,oemph name,emphasis area,[],[nps],name,[ ]). 
quiddity(craftje,pre req dept,department,[],[prerequisite,nps],identifier,[]). 
quiddity(craft,jc,pre_req_num,course,[],[prerequisite,nps],identifier,|[]). 
quiddity(craft,jc,qtr_name,quarter,[],[],name,[]). 
quiddity(craft,jc,qtr,quarter,[],[],identifier,|[]). 
quiddity(craft,jc,yr,year,|[],[],identifier,[]). 
quiddity(craft,jc,coreqtr,quarter,[(student],[current],identifier,[]). 
quiddity(craft,js,ssn,student,[],[],identifier,[social_security]). 
quiddity(craft,js,lastname, person,|[],[],surname,[]). 
quiddity(craft,js,firstname, person,[],[],name,[given]). 
quiddity(craft,js,section,section,[],[curriculum],identifier,[]). 
quiddity(craft,js,dept,department,[],[nps],designator,[]). 
quiddity(craft,js,crs_number,course,[],[nps],designator,[]). 
quiddity(craftjs,crs name,course,[],[nps],name,[]). 
quiddity(craft,js,emph,emphasis area,[],[nps],identifier,[]). 
quiddity(craftjs,emph name,emphasis area,[],[nps],name,[]). 
quiddity(craftjs,pre req dept,department,[],[prerequisite,nps],designator,[]). 
quiddity(craftjs,pre req num,course,[],[prerequisite,nps],designator,[]). 
quiddity(craftjs,qtr name,quarter,[],[],name,[]). 

quiddity (craft,js,qtr,quarter,|[],[],identifier,[]). 
quiddity(craft,js,yr,time_period,[course],[],year,[given]). 

quiddity (craft.js,coreqtr,quarter,[time],[current],indicator,[]). 
quiddity(craft,bt,ssn,student,[],[ military officer],identifier,[social security]). 
quiddity(craft,bt,lastname,student,[],[military officer],surname,[]). 
quiddity(craft,bt,firstname,student,[ ],[military officer],name;[]). 
quiddity(craft,bt,section,section, [student,curriculum],[],identifier,[]). 
quiddity(craft,bt,dept,department,[],[nps],identifier,[]). 
quiddity(craft,bt,crs number,course,|[],[],identifier,[]). 
quiddity(craft,bt,crs name,course,[],[],name,[]). 
quiddity(craft,bt,emph,emphasis area,[],[curriculum],designator,[ ]). 
quiddity(craft,bt,eemph name,emphasis area,[],[curriculum],name;[]). 
quiddity(craft,bt,pre req dept,department,[],[ prerequisite], designator,[]). 
quiddity(craft,bt,pre req num,course,[],[prerequisite],identifier,[]). 
quiddity(craft,bt,qtr name,time period,[],[quarter],name;[]). 
quiddity(craft,bt,qtr,time_period,[],[],quarter,[]). 
quiddity(craft,bt,yr,time_period,[],[],year,[]). 
quiddity(craft,bt,coreqtr,quarter, [curriculum,time ],[],identifier,[]). 
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synonym(course,class). 


synonym(designator, identifier). 


synonym(designator,name). 
synonym(designator,title). 
synonym(quarter,term). 


synonym(quarter,time_period). 


isA(professor,person). 
isA(manager,person). 
isA(student,person). 
isA(military_officer,person). 
isA(prerequisite,required). 
isA(NPS, university). 
isA(current,time). 
isA(quarter,term). 
isA(year,term). 
isA(titleoname). 
isÁ(surname,name). 
isA(curriculum,department). 
isA(course,curriculum). 


isA(emphasis_area,curriculum). 


isA(class,course). 
isA(section,class). 
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Page: 1 masterDATA.DOS 


quiddity (nac, rb, crs, course, [], [nps],identifier, []). 

quiddity (nac, rb, curr_ofcr,manager, [], [curriculum,military officer],name, |)). 
quiddity (nace, rb, curric пан сог сала м S ee 

quiddity (nac, rb, degree ` tit, degree, [], [nps],title,[]). 

quiddity (nac, rb, dpt, department, [], [nps],designator, []). 

quiddity (пас, гр, firstname, person, [], [nps] ,name, [given] ). 

quiddity (nac, rb, lastname, person, [], [nps], surname, []). 

quiddity (nac, rb, section, time period, |], сепсе лае к е ЭШ 

quiddity (nac, rb, hours, course, |), не весна з ро о 

quiddity (nac, rb, prof phene, professor al imes), -elepnone number, ево | 
quiddity (nac, rb, emph_ area, emphasisvarea, (|) (currreulum|), title, |). 

quiddity (nac, rb, length, completion, [curriculum], [], term, [required]). 

quiddity (nac,rb,preq crs,course,;T]?[npsS?prereguvsSTtc улеп ха тори мірним 
quiddity (nac,rb,preq dpt,departmnent/dmiprereqwes rcelE Hen pc PIE 
quiddity (nac,rb,req crs,course,[],[emphasis area,required],identifier;[])* 
quiddity (craft, rb, ssn, student, [ip vents rer, [Soctalesecurm ima gis 

quiddity (craft, rb, lastname, student, [], [}, surname, []). 

quiddity (craft, rb, firstname, student, [],[],name, [given] ). 

quiddity (craft, rb, section, section, [], [class],identifier, []). 

quiddity (craft, rb, dept, department, [], [nps],identifier,[]). 

quiddity (Cratt, rb, crs посв Сола Со попи гел јој 

quiddity (crait, rb, crs mame, course И ре Ее ИО 

quiddity (craft, rb, emph, emphasis area, [|], [mps),1dentitier, [je 

quiddity (craft, rb,emph_ name, emphasis area, [],[nps),title,[]). 

quiddity (craft,rb,pre req dept, department, [], [prerequisite;nps],, identiticr am 
quiddity (craft, rb,pre req num, course, [], [prerequisite,nps] , identifier, [])- 
quiddity (craft, rb,qtr_name, quarter, [], [],name, []). 

quiddity (craft, rb, gtr, quarter, [|], |), 1dentitier, 

quiddity (craft, rb, yr, completion, [course, student], [], year, []). 

quiddity (craft, rb, coreqtr, student, [time], [], quarter, []). 


synonyms (course, class). 
synonyms (designator, identifier). 
synonyms (designator, name) . 
synonyms (designator, title). 
synonyms (quarter,term). 
synonyms (quarter, time period). 
synonyms (title, name). 

synonyms (title, identifier). 
synonyms (name, identifier). 
synonyms (quarter, section). 
synonyms (term, time_period). 


isA (professor, person). 

isA (manager, person) . 

isA (student, person). 
isA(military officer,person). 
isA (prerequisite, required). 
isA (nps, university). 

isA (current, time). 
isA(quarter,term). 
isA(year,term). 
isA(title,name). 
isA(surname,name). 
isA(curriculum,department). 
lsA (course,curriculum). 
isA(emphasis area,curriculum). 
isA(class,course). 
isA(section,class). 
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APPENDIX C -- PRIMARY EXPERIMENT 
This Appendix contains samples of the contents of the packet given to the students during 
the primary experiment. Additionally, the actual experiment data results along with the master 
quiddities is provided. Finally, detailed tabular experiment data (from prototype) is included along 


with several bar graphs for further clarification. Items specified above are located on the following 


pages: 
ПИТИ Тор ресе и ....................... 96 
| УС о СИРИ t LETTORE TIENI. leere ee nan 98 
Instruction Sheet (with blank answer sheets) ...................... 101 
Капиз MM ы odi. cu. eer 104 
llnta Dictiohasy SML IA n SUME UA e e M Re eee 105 
Sample Database Reports .................................... 107 
Experiment Quiddity Рећпшопз5 ............................... 111 
Master Quiddity Пећпшоп5 ................................... 117 
KavsDHUAand Graphs ч Мени и. е. .......... 119 
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EXPERIMENT #2 


A. PURPOSE 


The purpose of this experiment is to gather data to assist me in 
analyzing the concept of "quiddity". The first experiment gave everyone a 
broad view of the quiddity concept and practice in applying the concept to a 
database. The second experiment is the more important of the two and will be 
more formally structured. I am still interested in any and all comments you 
may have regarding quiddity. 


B. REVIEW OF THE QUIDDITY CONCEPT 


"Quiddity" is the name given to the description of what information is 
captured by the data element. We are attempting to capture the "meaning" of 
what the data element represents. 


1. Components of Quiddity 


a. Quiddity is made up of five components, stuff, stuff type, 
stuff attribute, stuff attribute type, and arity. To find values for these 
components, we must answer the following questions. 


* STUFF- What is it about? 
STUFF TYPE- What sort of stuff is it? 


* STUFF ATTRIBUTE- What is it about the stuff you are 
interested in? 


STUFF ATTRIBUTE TYPE- What sort of stuff attribute is it? 


ARITY- What is the stuff a function of? 


b. Some important "rules of thumb” to follow are: 


e Most important fields are STUFF and STUFF ATTRIBUTE. 
You must have both of these to have a meaningful 
quiddity, just like you must have a subject and 
a verb to have a complete sentence. There is one 
and only one value for these components in each 
quiddity expression! 





€ Capture "meaning" of what the data element represents. 

e When determining quiddity, look at the definition of the 
data elements, rather than the names of the data 
elements themselves. 


e Some data element names are deceptive/un-informative 


2. New Approach 
As stated earlier, the two most important components of quiddity are 


stuff and stuff attribute. If we can find these, we will have captured the 
data element meaning. Most people seem to have difficulty distinguishing 
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between the two components. I have developed some new questions to ask 
yourself when defining these components. I hope these questions together with 
the above method will clarify the concept. 


a. 


The idea is to find the stuff attribute first. Once this is 


done, the stuff component follows naturally. Try these steps: 


* Look at a collection of actual data contained in the field 


* Classify the data by grouping the collection under a 
general heading or name which answers the question "What 
is it?" or "What are these?" What do you actually see in 
the field? We want to categorize the actual words, 
codes, numbers, etc., that we see in the field. The data 
is a MEASURE of something. The MEASURE is the stuff 
attribute and the SOMETHING is the stuff! We are not 
concerned with what the data are representative of in the 
physical or concrete sense. We are looking for an 
abstract noun. Stuff attribute is not a dimension!! 


Some examples are: 


The data in the field looks like this "$23.34". This is the 
cost (stuff attribute) of SOMETHING (stuff). 


The data in the field looks like this "sofa", "chair", "TV", 
"table", etc. You might be tempted to say that the "category" 
of this data is "furniture" but you would be wrong! We want 
to capture a measure of the data, not what the data represents 

in the physical sense. What measure is this, or what are 
they? They are NAMES! Names of what? Property! 50, the 
stuff attribute is "name" and the stuff is "property". Тһе 
questions above are also answered! 


(Note: You will still have the data dictionary to look at too!) 
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WORK SHEET’ 
(Updated for Experiment #2) 


1. How do we capture the "meaning" of what a data element represents? 


а. A proposed method for capturing this “meaning” uses a type of formal 
"language" with various rules for forming the definition of the "meaning." 
This definition or description is called "quiddity." 


"From the Oxford English Dictionary, quiddity is 'The real nature or 
essence of a thing; that which makes a thing what it is.' Of course, ... 
[the proposed] language for expressing quiddities is only a model, or 
approximation, of genuine quiddity, if it exists." 


Example 1: 
@ DATABASE 1 @ DATABASE 2 
- Variable: purchase cost - Variable: cost of purchase 
- Description: - Description: 
"Purchase cost of a truck" "Cost of purchase of a truck" 


b. Let's begin defining the basic component of quiddity. 


Example la: 
@ DATABASE 1 @ DATABASE 2 
- Variable: purchase cost - Variable: cost of purchase 
- Description: - Description: 
"Purchase cost of a truck" "Cost of purchase of a 


truck" 
- Dimension: currency 
- Dimension: currency 
- Stuff: truck 
- Stuff: truck 


1А11 examples and quotes in this work sheet have been borrowed from the 
following reference: Bhargava, Hemant K., Steven O. Kimbrough, and Ramayya 
Krishnan, Unique Names Violations: A Problem for Model Integration or You Say 
Tomato, I Say Tomahto (University of Pennsylvania, Department of Decision 
Sciences, Working Paper, 1990, forthcoming, ORSA Journal on Computing, Spring 
1991), рр. 5-8. 
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2. Now, let's change the variables slightly. 


и 2: Notice that the description changed but not the "dimension" or 
"а Џ 5 “ 


Example 2: 
@ DATABASE 1 @ DATABASE 2 
- Variable: purchase_cost - Variable: production_cost 
- Description: - Description: 
"Cost of purchasing a truck" "Cost of producing a truck" 
- Dimension: currency - Dimension: currency 
=- Stuff: truck - Stuff: truck 


b. What is the quiddity? 


Sample line of reasoning used in Example 2a to describe "quiddity." 


Both variables are about the same stuff: trucks. They differ in 
what it is they represent about trucks. What is it about trucks 
they describe? Cost, in both cases. What kind or sort of cost are 


we interested in? Purchasing in one case and production in the 
other. This line of reasoning suggests the quiddity descriptions in 
Example 2a. 





Example 2a: 


€ DATABASE 1l 
- Variable: purchase cost 
- Description: "Cost of purchasing a truck" 
- Dimension: currency 


Quiddity: cost(purchase(truck)) 
STUFF ATTRIBUTE Ee STUFF 
STUFF ATTRIBUTE TYPE 


Quiddity Paraphrase: "the purchase cost of a truck" 


€ DATABASE 2 
- Variable: production cost 
- Description: "Cost of producing a truck" 
- Dimension: currency 


Quiddity: cost(production(truck)) 
STUFF ATTRIBUTE —! STUFF 
STUFF ATTRIBUTE TYPE 


Quiddity Paraphrase: "the production cost of a truck" 
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3. Below is a list of several data elements in a "Home Inventory" database. 
See if you can define the quiddity for each data element. 


DATA DICTIONARY EXCERPT: 
FIELD NAME ТАРЕ DESCRIPTION 


ITEM l(character)40 Identifies a specific piece of 
property, i.e., sofa, dining room 
chair, TV, etc. 


QUANTITY l(integer)3 Identifies the total number of like 
items or pieces of property owned, 
i.e., "2" if two sofas are owned. 


VALUE l{integer}8 Identifies the current replacement cost 
of a specific piece of property. 


DATE l{date}8 The month, day, and year the property 
was purchased or acquired. 


PRICE 1(іпседег)8 Identifies the amount paid for a 
specific piece of property. 


WEIGHT l{integer}5 The total number of pounds a specific 
piece of property weighs. 


FREE WEIGHT 1(1о4іса1)1 Whether weight of a specific piece of 
property applies toward the 
professionalweight allowance or not, 
i.e., "Y" if yes or "N" if no: 





like 
= [= | = [=> 
= [= [з= [ = | m 
а == | ___| нем | == | 
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professional 
property 





EXPERIMENT #2 


Instructions: 


1. Determine the quiddity for each data element listed. Record the 
components of the quiddity in the appropriate columns of the row listing the 
data element. Please write legibly. 


2. Please keep track of the order in which you determine the quiddity 
components for each data element by placing a number in the upper left corner 
of the appropriate "box" in the table. For example, if the first term you 
define for the first data element is its stuff, the second term is its stuff 
type, and the third term is its stuff attribute, the table would look like 
this: 





1 4 
SR re household retail 


3. When defining quiddities, you must have exactly ONE "stuff" component term 
and exactly ONE "stuff attribute" component term. However, the components 
ARITY, STUFF TYPE, and STUFF ATTRIBUTE TYPE may be left blank or have one or 
more terms for each quiddity, depending on the definition you are writing. 

If there is more than one term, list them together in the appropriate "box" 
and place each term's ordering number to its left in the box. 


4. І am interested in the "method" you use in determining the quiddity, 
particularly in the "thought process" you go through in working through this 
experiment. Please jot down the method you found most helpful in determining 
the quiddities. Any comments or suggestions you have (even in bullet form) is 
appreciated. 


COMMENTS: 
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NAC DATABASE 


DDITY 











CRAFT DATABASE 


^ STUFF 0 
“ATTRIBUTE 





QTR NAME 
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STUFF: 


completion 
course 
curriculum 
degree 
department 
emphasis area 
manager 
person 
professor 
quarter 
section 
student 
time period 


ARITY ARGUMENTS: 


course 
curriculum 
student 
time 


VOCABULARY 


STUFF TYPES: 


class 

course 

current 
curriculum 
emphasis_area 
military officer 
NPS 

prerequisite 
required 
university 


STUFF ATTRIBUTE TYPES: 


given 

office 

required 

social security 
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STUFF ATTRIBUTES: 


credits 
designator 
identifier 

name 

quarter 

surname 
telephone number 
term 3 

title 

year 


FIELD NAME 


CRS 


CURR_OFCR 


CURRIC_NAM 


DEGREE TIT 


DPT 


EMPH AREA 


FIRSTNAME 


HOURS 


LASTNAME 


LENGTH 


PREQ CRS 


PREQ DPT 


PROF PHONE 


REQ CRS 


SECTION 


NAC DATABASE DATA DICTIONARY 


1{DIGIT}4 


STRING 


STRING 


STRING 


l(character)2 


STRING 


STRING 


1{DIGIT}1 


STRING 


1{DIGIT}2 


1{DIGIT}4 


1{STRING}2 


+ l(DIGIT)3 + 
+ 1{DIGIT)3 + 


1{DIGIT}4 


1{DIGIT}4 


1{DIGIT}1 


TYPEDESCRIPTION 
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Four digit number assigned to a 
particular course of instruction at 
the Naval Postgraduate School. 


Military officer assigned to manage 
a particular curriculum. 


Title of curriculum course of study. 


Title of Degree which can be awarded 
by the Naval Postgraduate School. 


Two letter code that represents a 
particular department of the Naval 
Postgraduate School. 


Name of an emphasis area of study 
that students may elect courses from 
as a sub-specialty area within a 
particular curriculum. 


Person's given name 


Credit assigned to each course of 
instruction that meets graduation 
and degree requirements of so many 
credit hours. 


Surname of any person at Naval 
Postgraduate School. 


Length of time (in months) required 
to complete course of study in a 
particular curriculum. 


Course number that when combined 
with Prerequisite Department, 
identifies a course that meets the 
requirements of another course. 


2 letter code for a prerequisite 
department. 


Telephone number of a particular 
professor's office. 


Four digit number representing 
courses that are required for a 
particular emphasis area. 


Time period in which a given course 
is taught. 


FIELD NAME 


SSN 


LASTNAME 


FIRSTNAME 


SECT 


DEPT 


CRS_NUMBER 


CRS_NAME 
EMPH 
EMPH_NAME 
QTR NAME 
QTR 

YR 


PRE REQ DEPT 


PRE REQ NUM 


СОКЕОТЕ 


CRAFT DATABASE DATA DICTIONARY 


ТҮРЕ 


000..999 + "=° 
00..99 4 '-' 
0000505899 


1{CHARACTER} 15 


l(CHARACTER)15 


l(CHARACTER)4 


l(CHARACTER)2 


1{DIGIT}4 


1{CHARACTER}25 


1 {CHARACTER} 3 


1 {CHARACTER} 25 


1 { CHARACTER} 6 


1{DIGIT}2 


1{DIGIT}2 


1 {CHARACTER} 2 


1{DIGIT}4 


1{DIGIT}2 


DESCRIPTION 


Identifies students uniquely with 
their social security numbers. 


Identifies the last name of an individual. 


Identifies the first name of an 
individual. 


Identifies the section within a curriculum 
that a student is assigned to. 


The unique two letter code which 
identifies an Academic Department at the 
Naval Postgraduate School. 

Four digit number assigned to a particular 
course of instruction at the Naval 
Postgraduate School. 


Name assigned to a particular course of 
study at the Naval Postgraduate School. 


Three letter code identifying an emphasis 
area course of study at the Naval 
Postgraduate School. 


Name assigned to a particular emphasis 
area course of study at the Naval 
Postgraduate School. 


Name given to each academic quarter of the 
school year, i.e., Fall, Winter, Spring, 
and Summer. 


Two digit code identifying a particular 
academic quarter of the school year, i.e., 
OlsFall, 02=Winter, 03=Spring, and 
04=Summer. 


Two digit code identifying a particular 
year a student completed a course, і.е., 
90=1990, 91=1991, etc. 


Two letter code identifying a Prerequisite 
Department within the Naval Postgraduate 
School. : 


Four digit number assigned to a particular 
prerequisite course of instruction at the 
Naval Postgraduate School. 


Two digit code which indicates which 
quarter of the curriculum a student is 
currently in, i.e., 01, 02, 03, 04, 05, 
and 06. 
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Output Reports The predefined functions of the NAC are provided via use of 
established queries and designed reports which can be accessed from the dBase 


IV Control Center. 
the following table: 


AVAL 367 


PREREQ 


THESIS 


DEG INFO 


CRS TAUT 


EMPH CSE 


These reports and their associated functions are listed in 


Predefined Function 


Available Courses for Curriculum 367 which 
are offered in the Summer Quarter 


List of Prerequisite Courses for a 
specified Course or group of Courses 


Thesis Support and Special Interests for a 
specified Professor 


Information related to a specified Degree 
Program 


List of Courses Taught by a specified 
Professor 


Requirements for Specified Emphasis Area 


Examples of these reports have been attached. 
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SAMPLE NAC DATABASE REPORTS 


AVAL 367 
Curriculum 367 Courses Available for Summer Quarter 
Course Section Professor Description 
CS3010 1 Stevens Computing Devices and Systems 
153170 1 Нада Economic Evaluation of IS 
184185 1 Bui Decision Support Systems 
CRS TAUT 


Courses Taught by: Bui, Tung 


Office: 1320 Phone: (408) 646-3260 


Quarter/s 
offered 
Course Description W Sp би F 
IS4200 System Analysis and Design N Y N Y 
IS4185 Decision Support Systems Y N Y N 
DEG_ INFO 


Degrees Offered by Curriculum: 367 
Degree Title: MS in Information System Management 
Curriculum: Computer Systems Management 
APC: 335 P-Code: 0095P Length: 18 Months 


Convenes: Winter-N, SpringY, Summer-N, Fall-y 
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SAMPLE NAC DATABASE REPORTS 


EMPH CSE 
Emphasis Requirements for: IRM 


Required Option Elective 
Courses Course Courses 
IS3220 IS4184 CS4601 
IS3220 IS4184 IS3000 
PREREQ 
Course Prerequisites 
Prereq Option 
Course Course Course Remarks 
MN4154 MN2155 MN3161 
153503 153502 Can be concurrent 
THESIS 


Specialized Data for: Haga, William 
Office: 1218 Phone: (408) 646-3094 


Sabbaticals: NONE 


Special Duties: Adjunct Professor of Management Information 
Systems, Naval Postgraduate School. 


Special Interest: Studying the research methods used to gauge the 
success of information systems. He is 
interested in the by-products of systems 
implementations on small groups. His other 
research include the relationship of 
organizational structure and culture to 
information system success. 


Published works: Academy of management review, Accounting 
Reviews, American Journal of Economics and 
Sociology, American Sociological review, 
Astronautics and aeronautics, Behavioral 
Science, Computers and Security, Data Processing 
and Communication Security, Journal of 
Contemporary Sociology, Journal of the System 
Safety Society and Organizational Behavior and 
Human Performance. 
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Page No. l 


05/19/90 

SSN LASTNAME 
223747355 
270646400 
555229999 
222774444 
111884422 

Раде No 1 

DEPT CRS_NUMBER 
cs 2972 
cs 3010 
cs 3030 
IS 2000 
IS 2100 
IS 3000 
IS 3000 


SAMPLE CRAFT DATABASE REPORTS 


STUDENT 
FIRSTNAME SECTION EMPH 
APPLE PAUL PLO1 
GREEN SALLY PL01 
JONES BILL PL01 
MARS BRYON PL03 
FRANTZ JOAN PL03 
EMPHASIS 
PRE REQ DEPT PRE_REQ NUM 

cs 2970 

cs 2970 

cs 3010 

cs 3010 

Is 3170 
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COREQTR 
DSS 01 
МЕТ 01 
DSS 02 
TAG 06 
IRM 04 
CRS_NAME 


ADA FROM THE BEGINNING 
ADA FROM THE BEGINNING 
SOFTWARE DEVELOPMENT 


NAC DATABASE 
Subject A 








CURR_OFCR manager curriculum officer surname 
ПИ“ | = I s | 
БЕР ы: 
LR LLL 
Ege] om [= 


telephone 
PROF PHONE professor number 
| meram eos | | tme | 
| tenors | чне period} | curriculum | designator | 0000 


prerequisite 
PREQ CRS course NPS designator 
prerequisite 
PREQ DPT department NPS identifier 
required 
course NPS designator 
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NAC DATABASE 
Subject B 


military 
CURR OFCR manager curriculum officer 
БҮЛ [00 | н | ао | 


esa | ee ee аа _ 
= Еге UO 
== | = |" __| ___| До 
=== | = | __|  ј==| — 
“е-е |е [шт feo] U 
Ег 


telephone | office 
PROF PHONE professor number 
ене лаа f curriculum | [prphasis_area] credits | required 
иһә [шшш] || леа | хеҹыігеа 


prerequisite 
PREQ CRS course required  |designator 
prerequisite 
PREQ DPT department required  |identifier 
required 
emphasis area course designator 
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NAC DATABASE 
Subject C 





CURR OFCR person officer surname 
rw 
=== fom |] = | | 
= rss | 
= | = | | [e [s 
=== | = |___|___| = | 
i [хш [шшер U 
= 


PROF_ PHONE professor telephone office 
number 
EE Lm RÁáecÉRm не | 


113 








CRAFT DATABASE 
Subject 1 


Mia s Rs ae hte ОС 
Tt 0 t. s.s. . 1... 


БАҒА еккен кан ақ 






student identifier security 
== | = | = | 
әр 
г 
ae | = | | = | [ — 
== |== | = [=== UU 


prerequisite 

PRE REQ DEPT department NPS identifier 
prerequisite 

PRE REQ NUM course NPS identifier 


identifier 




















COREQTR identifier 








quarter student current 
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CRAFT DATABASE 
Subject 2 






.". 


RIBUTE 








military | identifier | social 
officer 


security 


military 
student officer surname 














military 
student officer 


section curriculum identifier 
= [== LEM 
елер |) шшр 
== |— L = 
u —D 
og EE _ 
н | аа 
me [I pm _ 
= [em | | _ 
= [ые Г = 
= Дыр Г = 


curriculum 
COREQTR quarter time identifier 
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CRAFT DATABASE 
Subject 3 






ee fem | | 
ми | [зе 
= | ee U 
ee | == U 
7 — [| EL 
ЖЕЛ ШЕ БЕНЕН БЕСІН ЕСІН БЕНЕН 
= E= е. 
mepe С С Е 


prerequisite 
PRE_REQ DEPT department NPS designator 


PRE REQ NUM course NPS designator 

= | [= 
= |= |e O 
Зоре рш И 
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NAC DATABASE 
MASTER 





CURR_OFCR manager curriculum 

= [== 
mm | L = = з 
== [=з = 
== | [= [== 
== T у= U 
“тте [== 
ег = 


telephone 
PROF PHONE professor number office 
mex amma Jeenasis are] | curriculum | titie | 000 
| tenors ff compietion | curricutum | | tem | required | 


NPS 
PREQ DPT department — prerequisite 
required 
course emphasis area | identifier 
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CRAFT DATABASE 
MASTER 






nr бөө, „сө, „= aa eee alate a ana o 2 Pu uSL Tan 
пети a a t aT a o7 o a". " Ta a Tau aT T. t ss T s.s. s s s s s.l... Y ah ale 
eLeLe e zuTe анысы анса" E Lt 
тат ЗЗЗРЗП ЭЗ ПЗПЗПЗЭЗШЗГЗЗЗ 





student identifier | security 
m LLL 
Comoe fmm | C | | = | == 
m ee emm UU 
=— ANA 
ss 
— n 
— 
s 


PRE REQ DEPT department — prerequisite| identifier 


NPS 


course 
completion student year 
ИРЕК 
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NAC / CRAFT DATABASE COMPARISONS BY PROCEDURE (Master) 


Synonyms -- Master Synonyms -- Master 
Test — £ Found Туреї | Туре П Test %Еоша Туре |1 Туре П 
111 1 0 4 111 1 0 4 
122 5 2 2 211 3 1 3 
132 5 2 2 311 13 11 3 
143 5 2 2 
122 5 2 2 
211 3 1 3 222 11 1 1 
222 11 7 1 322 31 27 il 
292 11 7 1 
243 11 7 1 132 5 2 
232 11 1 
311 13 11 3 332 31 2 1 
322 31 27 1 
332 31 24 1 143 5 2 2 
343 44 39 0 243 11 7 1 
343 44 39 0 
Homonyms -- Master Homonyms -- Master 
Test — $ Found Туре! Туре П Тез £ Found Туре! Type II 
111 3 0 0 111 3 0 0 
122 3 0 0 211 3 0 0 
132 3 0 0 311 3 0 0 
143 3 0 0 
122 3 0 0 
211 3 0 0 222 3 0 0 
222 3 0 0 322 1 0 2 
232 3 0 0 
243 3 0 0 132 3 0 0 
232 3 0 0 
311 3 0 0 332 1 0 2 
322 1 0 2 
332 1 0 2 143 3 0 0 
343 0 0 3 243 3 0 0 
343 0 0 3 
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NAC / CRAFT DATABASE COMPARISONS BY PROCEDURE 
(Average Totals of the 9 Comparisons for Each Procedure) 


Synonyms -- Experiment Synonyms -- Experiment 
Test #Found Type! _ Typell Test ”Ғоша Туре! Туре 
111 1.0 0.0 4.0 111 1.0 0.0 4.0 
211 2.9 0.7 2.8 122 2:3 1.0 3.7 
311 10.4 8.2 2.8 152 2.3 1.0 3.7 
143 2.7 1.3 3.7 
122 2.3 1.0 3.7 
207 11.1 7.4 1.3 211 2.97 7. um 2.8 
322 35.4 31.3 0.9 2 11.1 7.4 1.3 
252 10.8 hl 3 
132 2.3 1.0 3m 243 13.0 9.3 1.3 
232 10.8 7.1 18 
332 31.9 27.8 0.9 311 10.4 8.2 2.8 
322 35.4 31.3 0.9 
143 2.7 1.3 3.7 332 31.9 27.8 0.9 
243 13.0 9.3 1.3 343 46.1 41.4 0.3 
343 46.1 41.4 0.3 
Homonyms -- Experiment Homonyns -- Experiment 
Test | £ Found Туре! Туре П Test Z Found Туре! Туре 
111 2.8 0.0 0.2 111 2.8 0.0 0.2 
211 2.8 0.0 0.2 122 2.4 0.0 0.6 
311 2.6 0.0 0.4 132 2.4 0.0 0.6 
143 23 0.0 0.7 
125 24 0.0 0.6 
222 2.4 0.0 0.6 211 2.8 0.0 0.2 
322 1.3 0.0 1.7 222 2.4 0.0 0.6 
| 232 2.4 0.0 0.6 
132 2.4 0.0 0.6 243 2.3 0.0 0.7 
282 2.4 0.0 0.6 
332 1.3 0.0 1.7 311 2.6 0.0 0.4 
322 1.3 0.0 17 
143 2:8 0.0 0.7 332 139 0.0 17 
243 2.3 0.0 0.7 343 0.7 0.0 2:5 


343 0.7 0.0 285 
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Synonyms —— Equivalence Rule Comparison 


# Of Data Element Pairs 


n 
= 

О 
а. 
ie 

Ф 

E 
m 

ш 
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о 
w= 
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Figure 20 Synonyms -- Procedure Comparison (Component Rules 22) 
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Synonyms —— Equivalence Rule Comparison 


# Of Data Element Pairs 


n 
= 
О 
Го 
Е 
Ф 
Е 
9 
о 
2 
О 
© 
Ai 
о 
ы 





Figure 22 Synonyms -- Procedure Comparison (Component Rules 43) 
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Figure 24 Homonyms -- Procedure Com 


123 


ЖЫ ҚАРАҚ ТС АТАҚ Ққ ED 
Di MK ко о жо Же 
3, м Ko 


Equivalence Procedures 


Ко кож ж ж ж SOS 
KOO) 


ORO 
52: 


Ж. С? 
Корх 


= 
О 
2 
ы. 
O 
E 
° 
О 
ED 
5 
02 
Ф 
О 
@ 
KD 
O 
E 
5 
E 
Li) 
| 
| 
E 
> 
Z 
- 
O 
Sr 


~ 


ѕлод ұчәцзәјэ 2190 JO 7 


E 
сі 





ұғ 
CN 
со 
un 
@) 
"d 
са 
зә 
с 
л | 19 
5 ||Е 
~ | | с 
ші 
~ |а 
Ф 
=| 1S 
=| |g 
= 
23 
2118 
Ello 
= ||Е 
о 
S I9 
£ 
а 
E | id. 


gure 25 Homonyms -- 


i 


F 


Ison 


x 
O 
а. 
о 
о 
2 
=> 
a 
Ф 
O 
= 
Ф 
о 
E 


80094 


Homonyms 


3 


ч) 
N 


504 фчгшејз 040Д JO # 


243 
Equivalence Procedures 





(Component Rules 43) 


on 


. 


# Homonyms Found ШЕН / Туре ! Errors 
Procedure Com 


Homonyms 


igure 26 


F 


124 


NAC / CRAFT DATABASE COMPARISONS BY PROCEDURE 


Synonyms Synonyms 
Test-111 Test-211 
DB-Pair £ Found  Туреї ТуреП DB-Pair * Found Туре! Туре 
A1 2 0 3 Al 5 1 1 
A2 1 0 4 A2 1 0 4 
АЗ 2 0 3 АЗ 5 1 1 
ВІ 1 0 4 ВІ 3 1 3 
В2 1 0 4 В2 1 0 4 
B3 1 0 4 B3 3 1 3 
C1 0 0 зу Cl 3 1 3 
C2 1 0 4 C2 3 1 3 
C3 0 0 5 C3 2 0 3 
Sum 9 0 36 Sum 26 6 25 
Ауегаре 1.0 0.0 4.0 Average 2.9 0.7 2.8 
Synonyms Synonyms 
Test-122 Test-222 
DB-Pair £ Found Туре! Туре DB-Pair £ Found Туре! Туре 
A1 5 3 3 Al 18 13 0 
A2 2 1 4 А2 14 9 0 
АЗ 4 2 3 A3 18 13 0 
ВІ 2 1 4 Bl 7 5 3 
B2 1 0 4 B2 8 4 1 
B3 2 1 4 B3 7 5 3 
C1 1 0 4 C1 8 5 2 
С2 2 1 4 C2 8 S 2 
C3 2 0 3 СЗ 12 8 1 
Sum 21 9 33 Sum 100 67 1 
Average 2.3 1.0 3.7 Average 11.1 7.4 1.3 
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NAC/ CRAFT DATABASE COMPARISONS BY PROCEDURE 


Synonyms Synonyms 
Test-132 Test-232 
DB-Pair 7 Found Туре! Туре П DB-Pair_# Found Туре! Туре П 
А1 5 3 3 А1 18 13 0 
A2 2 1 4 A2 13 8 0 
A3 4 2 3 A3 18 15 0 
ВІ 2 1 4 В1 T 5 3 
B2 1 0 4 В2 7 3 1 
B3 2 1 4 B3 7 5 3 
С1 1 0 4 С1 8 5 2 
C2 2 1 4 C2 8 5 2 
СЗ 2 0 3 СЗ 11 7 1 
Sum 21 9 33 Sum 97 64 12 
Average 2.3 1.0 3: Average 10.8 7.1 13 
Synonyms Synonyms 
Test-143 Test-243 
DB-Pair £ Found Туре! ТуреП DB-Pair £ Found Туре! Туре 
Al 5 3 3 Al 18 13 0 
A2 2 1 4 А2 17 12 0 
A3 A 2 3 АЗ 18 13 0 
В1 2 1 4 В1 7 S 3 
B2 1 0 4 B2 12 8 1 
B3 2 1 4 B3 y 5 3 
C1 2 1 A C1 10 7 2 
C2 2 1 4 С2 8 5 2 
C3 A 2 3 C3 20 16 1 
Sum 24 12 33 Sum 117 84 12 
Ауегаре 2 йно on Average 13.0 9.3 13 
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NAC / CRAFT DATABASE COMPARISONS BY PROCEDURE 


Synonyms Synonyms 
Test-311 Test-332 
DB-Pair £ Found Typel _Typell DB-Pair £ Found Typel Туре 
Al 17 13 1 А1 49 44 0 
A2 3 D 4 A2 30 25 0 
A3 20 16 1 A3 53 48 0 
В1 16 14 3 В1 30 26 1 
В2 3 2 4 В2 22 18 1 
B3 13 11 3 B3 2 25 1 
C1 8 6 3 C1 22 19 2 
C2 10 8 3 C2 20 17 2 
C3 4 2 3 СЗ 32 28 1 
Sum 94 74 25 Sum 287 250 8 
Average 10.4 8.2 2.8 Average 31.9 27.8 0.9 
Synonyms Synonyms 
Test-322 Test-343 
DB-Pair_ # Found Typel  ТуреП DB-Pair Found Туре! Туре 
Al 49 44 0 Al 57 52 0 
A2 36 31 0 A2 51 46 0 
A3 53 48 0 A3 53 48 0 
B1 30 26 1 B1 52 28 1 
В2 27 23 1 В2 38 34 1 
B3 29 25 1 B3 36 32 1 
СІ 29 26 2 СІ 46 41 0 
С2 28 25 2 С2 44 39 0 
G3 38 34 1 C3 58 33 0 
sum 319 282 8 sum 415 373 3 
Average 35.4 315 0.9 Average 46.1 41.4 0.3 


127 


NAC / CRAFT DATABASE COMPARISONS BY PROCEDURE 


Homonyms Homonyms 
Test-111 Test-211 
DB-Pair £ Found Туреї Type DB-Pair £ Found Typel  Typell 
Al 3 0 0 Al 3 0 0 
A2 3 0 0 A2 3 0 0 
АЗ 3 0 0 АЗ 3 0 0 
ВІ 3 0 0 ВІ 3 0 0 
В2 3 0 0 В2 3 0 0 
B3 2 0 1 B3 2. 0 1 
С1 2 0 1 С1 2 0 1 
С2 3 0 0 C2 3 0 0 
C3 3 0 0 C3 3 0 0 
Sum 25 0 2 Sum 25 0 2 
Average 2.8 0.0 0.2 Average 2.8 0.0 0.2 
Homonyms Homonyms 
Test-122 Test-222 
DB-Pair £ Found Туреї  ТуреП DB-Pair # Found Туре! Туре 
A1 1 0 2 A1 1 0 2 
A2 3 0 0 A2 3 0 0 
A3 3 0 0 A3 3 0 0 
Bl 3 0 0 Bl 3 0 0 
В2 3 0 0 В2 3 0 0 
B3 1 0 2 B3 1 0 2 
СІ 2 0 1 С1 2 0 1 
С2 3 0 0 С2 3 0 0 
C3 3 0 0 C3 3 0 0 
Sum 722) 0 5 Sum 22 0 3 
Ауегаре 2.4 0.0 0.6 Average 2.4 0.0 0.6 
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NAC / CRAFT DATABASE COMPARISONS BY PROCEDURE 


Homonyms Homonyms 
Test-132 Test-232 
DB-Pair #Found  Typel  Typell DB-Pair #Found Туре! Туре П 
А1 1 0 2 A1 1 0 2 
A2 3 0 0 A2 3 0 0 
A3 3 0 0 A3 3 0 0 
Bl 3 0 0 ВІ 3 0 0 
В2 3 0 0 В2 3 0 0 
B3 1 0 2 B3 1 0 2 
Cl 2 0 1 C1 2 0 1 
2 3 0 0 С2 3 0 0 
C3 3 0 0 C3 3 0 0 
Sum 25 0 5 Sum 22 0 5 
Average 2.4 0.0 0.6 Average 2.4 0.0 0.6 
Homonyms Homonyms 
Test-143 Test-243 
DB-Pair £ Found Typel ТуреП DB-Pair £ Found Туре! Туре П 
Al 1 0 2 А1 1 0 2 
А2 3 0 0 А2 3 0 0 
АЗ 3 0 0 АЗ 3 0 0 
В1 3 0 0 В1 3 0 0 
В2 3 0 0 В2 3 0 0 
B3 1 0 2 B3 1 0 2 
Cl 1 0 2 СІ 1 0 2) 
p 3 0 0 C2 3 0 0 
C3 3 0 0 C3 3 0 0 
Sum 21 0 6 Sum 21 0 6 
Average 2.3 0.0 0.7 Average 23 0.0 0.7 
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NAC / CRAFT DATABASE COMPARISONS BY PROCEDURE 


Homonyms Homonyms 
Test-311 Test-332 
DB-Pair 7 Found Туре! Туре DB-Pair £ Found Туре1 Туре 
Al 3 0 0 А1 1 0 2 
A2 5 0 0 A2 3 0 0 
A3 3 0 0 A3 1 0 2 
ВІ 2 0 1 ВІ 1 0 2 
В2 3 0 0 В2 2 0 1 
B3 2 0 1 B3 1 0 2 
C1 2 0 1 Cl 1 0 2 
С2 2 0 1 C2 1 0 2 
C3 3 0 0 C3 1 0 2 
Sum 23 0 4 Sum 12 0 15 
Average 2.6 0.0 0.4 Average 1.3 0.0 1.7 
Homonyms Homonyms 
Test-322 Test-343 
DB-Pair #Found TypeI ТуреП DB-Pair % Ғойпа Туре! Туре П 
А1 1 0 2 А1 0 0 3 
A2 3 0 0 А2 1 0 2 
A3 1 0 2 АЗ 1 0 2 
В1 1 0 2 В1 1 0 2 
B2 2 0 1 В2 1 0 2 
B3 1 0 2 B3 0 0 3 
Cl 1 0 2 Cl 0 0 3 
C2 1 0 2 C2 1 0 2 
СЗ 1 0 2 C3 1 0 2 
Sum 12 0 15 Sum 6 0 21 
Average 1.3 0.0 des Average 0.7 0.0 Ж 
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WITHIN DATABASE COMPARISONS 


Within Database Comparison -- NAC 


No. of No. of No. of 
Test Homonyms Synonyms Matches 
111 35 0 10 
211 30 0 15 
311 30 15 15 
122 26 4 19 
po» 16 8 29 
322 10 69 35 
152 26 4 19 
202 18 8 21 
332 12 60 32 
143 p? 4 23 
243 12 10 33 
343 6 110 39 


Within Database Comparison -- CRAFT 


No. of No. of No. of 
Test _Homonyms Synonyms Matches 


111 38 0 y 
2 33 6 12 
311 31 28 14 
122 27 5 18 
222 16 43 29 
322 12 111 33 
132 D 5 18 
232 17 41 26 
332 13 99 27 
143 25 8 20 
243 17 46 28 


343 8 128 37 
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Within Database Comparison -- NAC 


No. of No. of No. of 
Test. Homonyms Synonyms Matches 
111 35 0 10 
122 26 4 19 
132 26 4 19 
143 22 4 23 
211 30 0 15 
222 16 8 29 
232 18 8 27 
243 I2 10 33 
311 30 15 15 
322 10 69 35 
332 12 60 32 
343 6 110 39 


Within Database Comparison -- CRAFT 


No. of No. of No. of 
Test Homonyms Synonyms Matches 


111 38 0 7, 
122 2] 5 18 
132 2] 5 18 
143 25 8 20 
211 33 6 12 
222 16 43 29 
232 17 41 26 
243 17 46 28 
311 31 28 14 
322 12 111 33 
3532 13 99 32 
343 8 128 37 
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Figure 27 NAC Quiddity Sameness -- Term Equivalence Rule 1 
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Within Database Comparison 
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